DOCUMENT RESUME 



ED 215 328 

Author- 

TrTLE 

INSTITUTION 
REPORT NO 
PUB, DATE 
NOTE 

AVAILABLE FROM 



EDRS PRICE 
DESCRIPTORS 



CS -006 639 

Pikulski, John J., Ed.; Shanahan, Timothy, Ed. 
Approaches to the Informal Evaluation of Reading. 
International Reading Association , Newark, Del. 
ISBN-0-87207-528-1 
82 

l25p. 

International Reading Association, 800 Barksdale Rd., 
P. 0. Box 8139, Newark, DE 19711 (Order No. 528, 
$4.00 ^member , $6.00 non-member). 

MF01/PC(T5 Plus Postage. 

Classroom. Observation Techniques; Cloze Procedure; • 
Content A^etT Reading; Elementary Secondary Education; 
♦Evaluation Methods; Informal Assessment; 'Informal 
* heading inventories; Oral Reading; Reading 
• ' .Conforehension ; *Reading Diagnosis; *Rwading 

' irfpVuotion; Test Reliability; Test Validity; Word 
Recognition; Writing Skills 

ABSTRACT ' , 

' , ' Tlfe eight articles in this compilation provide 

various approaches and- techniques for use by classroom teachers in 
the informal evaluation 6f student reading performance. The first 
article outlines the many purposes .for which informal measures may be 
used a^nd -briefly describes* the various forms such measures may take,- 
while .the second ^fppuses on teacher observation and addresses 'the 
testing* concepts o'f reliability and validity. The third article _ 
discusses the manner "in which 'oral reading should be evaluated and 
, how measures of. -feral" reading should, be interpreted , and the fourth 
- offers suggestions for evaluating decddin'g as weir.as comprehension 
skills. The fifth article reviews t^e many forms that cloze - 
techniques can take, illustrates the breadth, of 4 informal evaluation 
procedures, and offers instructions for thtuponstructioh and ' - - 

interpretation of cloze tests. The sixth article provided suggestions 4 
as to how content area teachers can^se, , informal procedures with 
their students, and the seventh illustrates the interrelationships of . 
the language skills by noting- that if teachers begin, to analyze the. 
writing students produce they will gain many insights into Jhe *^ 
general language skills their students p'ossess. The final* paper . W 
reviews research indicating the values and limitations of informal 
^reading inventories. (FL) " , - • 
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Foreword 



What useful information is there in a standardized test given to 
^ children all over the country in 30-45 m'inute sessions that & 
skilled teacher does not know from interacting with a group of 
pupils six hours a day 180 days a year? Not very much, many 
people feel. 

The insightful teacher engages in continuous informal 
evaluation as this teacher-pupil ihteractitfn takes place. Much 
of this evaluation becomes a part of the teaching itself, a 
continuous kid-watching'' as' teachers observe their pupils 
. reading and responding to instruction. Some of it utilizes self- 
constructed devices or published ones for observing and 
analyzing what pupils are doing. These devices^nd procedures 
vary considerably.in their complexity , 4 but all of them colne 
umterthe classification in this volume of informalevaluation. 
The£ are -informal in contrast to formal standardized or 
criterion-referenced assessment tests. Beyond that they ob- 
viously vary in j ust how formaf or informal they are. They also 
~vary* in termsyoj^how much knowledge they require of the 
teache^and how*much 6ontroJ of the process of evaluation they 
leave to the teacher. * , 

Informal evaluation is done for a variety of purposes: to 
plan instruction, to place pupils at 'instructional levels, to 
evaluate progress, to see strengths and weaknesses. 

This book brings together a group of scholars who 
•clearly know their informal reading evaluation. They present a 
considerable range of such procedures, enough to extend the 
teacher already committed to informal evaluation, enpugh to 
inform those ready to begin informal evaluation, enough to, 
provlde,an important source Qf information for schMars in the 
field. ~ % 

The International Reading Association is proud to offer 
this important contribution to reading evaluation. * ^ 

/Kenneth S. Goodman, President 
* . International Reading Association ' 

" • 1981-1982 
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Introduction 

It is not uncommon to meet teachers who feel vfery insecure about 
testing or evaluating feading* skills. Often they -feel that theo 
results obtained on a group test of reading or on some brief in- 
dividual measure, are inaccurate, but they feel powerless to 
challenge thfese results. Thi S/ volume^represents a bringing to- 
gether of a large number of alternatives that can l?e used by such 
teachers. None of the papers is. suggesting use of B specific test; 
instead they describe, in detail, procedures that teachers can use 
^ flexibly^nd with a wide variety of materials to answer questions 
that will be helpful in glanijing a reading instructional program 
. for a child or group, of children. These various approaches and 
techniques are all classified by the authors of this volume as 41 in; 
. formal evaluation procedures/* 

Virtually every comprehensive treatment of the measure- 
ment of reading skills makes mention of "informal" approaches 
to evaluation. Like so many terms in the field of reading educa- 
tion, there ar^ widely Offering views of what informal reading 
evaluation really is. This yolume takes the position that the term - 
"informal evaluation" as applied to reading is a very broad one. . 
Johns, in the introductory paper, outlines the many purposes for 
which informal measures may be used and briefly describes the 
various forms that informal eyafaation tools may take. Cunning- 
ham looks at what Ultimately may be tHe'most powerful informal J 
evaluation tool-teacher observation. She particularly addresses 
two.central testing concept^, reliability and validity, that are all 
£too often avoided m discussions for informal reading evaluation. 
She offer^cohvincing argument *as to why the ongoing observa- 
tion that teachers can, make, as they interact with pupitein an in- 
structionalsetting, may be the most reliable^nd valid approaph 
that one can take in the diagnosis of reading behavioi;. 

The, next five papers in the. vplume are somewhat more 
specific in purpose and scope/ Botel, clearly operating from "a 
theoretical framework that vje^s reading, as- part^of a larger 
language fcommunicatipn process, translates* theoretical perspec- 
tive and experimental results into , specific suggestions' for 

: • ' •. ■ , . ";/ ' ■■] •' 
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evaluating decoding as well as comprehension. Hammond speci- 
fically addresses the manner in which oral reading should be 
evaluated arid how measures v f oral reading should be inter- 
preted. He toq begins with a theoretical "position and moves to 
practical suggestions. He looks at informal evaluation from the 
perspective of a psycholinguistic view of the reading process and 
integrates the results of the research done in the area of miscue 
analysis. * u , 

The paper by Pikulski and Tobin, which reviews the 
various forms that cloze techniques can take, illustrates the 
breadth of the procedures that the authors of this volume see as 
falling under the heading "informal evaluation." Directions of- 
fered for the construction and interpretation of cloze tests are 
Specific enough to serve as a useful guide to reading specialists 
and classroom teachers interested in usiog this technique. 

Hansells paper offers suggestions as to how teachers in 
content areas can employ a variety of informal evaluation pro- 
cedures to obtain information that will allow them to determine 
the factors that may be limiting tfieir students from obtaining in- 
formation through printed materials. All too often informal 
equation is seen as a technique appropriate only for the spe- 
cially trained teacher ofoe^ding, Hansell shows t^iat it can be pro- 
fitably and efficiently used by an^ teacher. 

Though the points of view expressed and Jthe ^specific re- 
commendations made in 'the various papers are not always in 
agreement, a thread that runs through all of this is that reading 
must be viewed as a linguistic process and as part of the larger 
area* Of language and - communication. Cramer specifically il- 
lustrates itie interrelatedness of the language skills by sug- 
gesting that if "teachers .begin to diagnostically analyze the 
meriting that students produce, they will gain many insights into 
general language skills and more specifically into the reading 
skills that the students possess. ' 

, The final paper is somewhat differentlfrom the others be- 
cause rather than offering suggestions for the administration and 
interpretation of reading diagnostic procedures, it reviews the 
eviVience that exists regarding the values and limitations of a 
specfic form of testing— InforrAal Reading Inventories (iRIs). un- 
like the point of view taken in thid volume, there are many wn 
would equate informal evaluation with IRIs, and, indeed, IRIs re- 



ntein one of the most common forms of reading evaluation used. 
Pikulski and Shanahan review the research that addresses quesP; 
tions surrounding the use of these very popular instruments. 

In addition to the common thread of viewing reading as be- 
ing part o/ the language process, the papers in thig volume have- 
at least one other quality- they, are nondogmatic. They consis- 
tently make modest* claims for the tecHniques being recom- 
mended; they consistently si^ggest that the results of any reading 
evaluation must be viewed as tentative; they also consistently 
suggest that unless the results are carefully, critically inter- 
preted by informed, capable reading specialists in the. larger 
framework of a student's day-to-day reading,perfo.rmance, thare- 
sults of an evaluation will be useless, and in some cases, poten- 
tially destructiv^ 

i JJP 
„ . * . TS 
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The Dimensions and Uses, of 
Informal Reading Assessment 

■ Jerry I, Johns 
Northern IllinoiAUnjversity 

Purposes for Evaluation 

. The uses of informal evaluation vhry considerably. While one 
«^ teacher might construct an informal reading test to assess a stu- 
dents ability to use maps in a social studied book, another 
teacher might use students' performances on workbooK pages tp 
h^lp assess their ability |o use context, cues or other reading 
strategies. This article provides an overview of selected informal 
strategies for assessing reading and reading-related behavior. A 
m t perspective on informal assessment is given, followed by descrip- 
tions and examples of the major types of informal assessment 
strategies. 1 s 

'Perspective on Informal Assessment inN^g^ff^g 

Informal tests and measures of reading performance Vary 
widely in their jscope and so'phistication. They also vary widely in 
their validity and reliability which tend to depend, to a large 
degree, on the care given to their construction and the uses for 
which they-are employed. 

Any type of assessment in reading must begin with clearly 
defined purposes. There are at least four major purposes for infor- 
mal assessment: 1) studying, evaluating, or diagnosing reading 
behavior; 2) monitoring student progress; 3) supplementing and 
confirming information gained from standardized and "criterion- 
referenced instruments; and 4) obtaiping information not readily 
available from other sources.* 
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Informal Reading Assessment * , 

As teachers develop, select, or use informal assessment 
strategies, it is important that they keep their purposes io mind. 
Teadhers need to know whether their assessment focus is on 
schools, classrooms, indiv iduals, Jesson^ior programs. The range 
of grades or areas of assessment should also be considered. 
Generally, -teachers tend to concern themselves with aspects *A 
their program of reading instruction, which implies that the as- 
sessment strategies will be designed to assess students in a 
classroom. setting Thfere are also various levels of assessment. 
The survey levpl focuses on global skills and abilities. The 
specific^level focuses on a particular skill or ability. The intensive 
level concerns an in-depth appraisal of a student's reading 
behavior and is often accomplished by a specialist in a clinic or 
remedial setting. . **' 

Major Types of Informal Assessment Strategies 

« The inner-ocular tecHnique For years teachers hav^toeen 
\is*ing what this author has che^sen to call the inner-ocular tech- 
nique (iot) for assessing and monitoring the Reading program 
(Johns, 1979a). The ^erm was invented in hopes that this pseudo- 
scientific abbreviation might help teachers legitimize something 
they have always ,done: use observation skills tq help determine 
whether their instruction is producing, the desired results. THis 
form of evaluation is what Cunningham has referred. to in this 
volume as "di/gnosis by observation." Careful and systematic 
observation c&n help teachers place students in appropriate 
materials; assess readiness for a given task, determine reading in- 
terest; asse/s attitudes, and make decisions about decoding, com- 
prehension; and study skills. "When teachers put their observa- 
tion skills to work, they employ a powerful form of assessment. 
Perhapsr one of theinost compelling reasons for using Che IOT is 
that it provides a continuous*fnethod to monitor or evaluate the 
student's successes and failures in important components of the 
reading program The LOT is a dynamic prpcess that builds x>n 
day-to-day behavior. A detailed discussion of the value and forms 
that this type of assessment takes is included in the next chapter 
of this volume. 

Conferences Related to the IOT is the teacher : student con- 
ference. Such conferences, while brief, can help the teacher/be-* 

* — 

*> 
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come better acquainted with the student, assess attitudes toward 
reading, uncover strategies for reconstructing meaning from 
print, explore possible interests, discuss the book the student is 
reading, assess oral reading (note that round-robin oral reading is 
• not used), and explore che student's notions about reading. 

Conferences can pequently be strengthened by making 
notes after the conference Sr using a checklist of items frequently 
discussed. Notecards, folder^or notebooks have been used sue- * 
cessfully by teachers to keep V:ords. Hill (1979) has provided an 
extensive set of questions tiiatYan be used with older students 
during conferences to assess theikreading-study Habits. . " * 
, Informal reading inventory^ Perhaps the most widely 
known form of systematic informal assessment is the informal 
reading inventory (IRI). An mi is an individually administered 
reading test composed of a series of graded word lists and graded 
- passages that the student reads aloud to the teacher. As the stu- 
dent reads, the teacher notes oral reading errors or miscues such 
a3 mispronunciations, omissions, repetitions, and substitutions. 
After the oral reading, the teacher asks the student comprehen-, 
sion questions. Silent reading passages and/passages r^ad to the 
student to determine a listening comprehension level, both ac- 
companied by comprehension checks, are also usually included. . 

The student's performance on the'IRI forms the basis for 
establishing a student's independent/ instructional, and frustra- 
tion-heading levels as well as strengths and weaknesses in word 
t recognition and comprehension. Perhaps the most important use 
of the IRI is to help the teacher match the student's reading abil- 
ity with appropriate instructional ' materials. Some educator 
believe that as many as 50 to 70*percent? of students are placed in 
books that are too difficult. Matching students with the appro- 
priate difficulty level of re&ding materials, therefore, may be.one 
of the most important actions a teacher can take to* improve in- 
struction. 2' * 

In addition to determining the proper level for instruction, * 
' teachers can also use the results Of ari IRl^o better understand 
p the student's word attack apd comprehension strategies. Areas 
frequently evaluated include context and language cues, phonic 
cues, structural 'analysis, and the ability to answer various types 
of comprehension questions. Once /the student'slsfirengths and* 
weaknesses have been determkied/afcpropriate reading strategy 
lessons may be developed. The following sources proyide guide- 
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lines for the preparation of sfrategy'lessons: Allen and Watson 
* "(1976), Christie (1979), Gillespie- Silver (1979), Johns (1975), Mar- 
*ing(1978), anct Spiegel (1978). 

Teachers may either construct their own IRIs or purchase 
commercially published inventories. In 1977, Johns and others 
prepared, a list of published reading inventories. Some of the IRIs 
that have been published since 1977 include the following: 

^Analytical feading inventory, 2nd ed. (primer through 9), 
Charles E. Merrill Publishing, 4 1 300 Alum Creek Drive, Golum- 
bus, Ohio 43216. r ■ • 

'Advanced reading inventory (grades 7 through college). 
William d! Brown Company, 2460 JKerper Boulevard, Dubuque, ' 
Iowa 52001. 

s Jhtslc* reading inventory, 2nd e^(preprimer throujgh 8). 
Kendall 'Hunt Publishing, 2460 Kerper* Boulevard, Dubuque, 
Iowa 52001. 

Ekwall reading inventory (preprimer through 9). Allyn and 
- Bacon v 470 Atlantic Avenue, Boston, Massachusetts 02210. ' 

Informal reading assessment (preprimer through 12). Rand 
1 'McNally, Box 7600, Chicago, Illinois 60680: 

.For those who are interested in preparing their .own IRIs, 
the following sources are fecorpmended: Johnson and Kres^ 
(1965), Leibert (1969K Valmont (1972), and Zintz (1975). Jlesea*fli 
related to the effectiveness of informal reading inventories is 
summarized in this volume in the chapter by Pikulski and 
' Shanahan, 

Cloze procedure. The cloze procedure is yet another form of 
informal evaluation that can be used for a variet^of assessment 
purposes. Generally, it involves omitting words^from paragraphs 
# of material, replacing the omitted word with blanks of uniform 
length, and asking students to fill in the omitted.words. 

Teachers, -j&hB. wish to use the cloze procedure 'for 
evaluating the suitability of reading materials should refer to the 
chapter in this%volume by Pikulski and Tobin. It includes details 
related to constructing, scoring, and interpreting cloze tests. 

Teachers can use cloze informally to help teach students 
how .to use context cues and to improve their comprehension 
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(Jongsma, 1980). The general procedure is to delete words in 
some rational manner; e.g., verbs or nouns. Students are then 
told to fill in a word that makes sense. After words have been sup- 
plied, a discussion of the appropriateness of students' responses 
occurs. When cloze is used to teach context cues or to improve 
comprehension, synonym scoring of responses is recommended. 
'Teachers can ask students whether the word(s) they suggest 
* make sense in the context of the sentence, paragraph, or passage. 
Teachers may find the work of Rankin (1977) particularly 
helpful tn developing strategies for introducing the cfoze pro- 
cedure, selecting Feading passages and word deletions, using 
visual cues, and providing appropriate reinforcements. An an- 
notated bibliography on ; the*clo^e procedure has been prepared by 
McKenna and Jlobinson (1980). Other recent sources for helping 
teachers use the cloze procedure include Arnold and Miller (1980) 
and EkwaH"(1976). . < ^ 

Attitude inventories. More and more teachers realize (hat a 
reading program, if it is to be successful, must have at least two 
major goals: to teach students how to read, and to create 
students who want to read. Measurements of students' attitudes^ 
fc^kune important if the secoriti goal of the reading program is to 
be achieved. 

/ Attitudfe surveys represent one way of obtaining some no- 
tion of students' attitudes toward reading. In the primary 
grades, st&tenfents like the following could be read to students 
who could respond by circling yes/no on an answer sheet or cir- 
cling the appropriate face ( ^ 

I can* read as fast as the. good readers. 
I like to read. 
jjPMke to read long stories. 
The books I read in school are too hard. 
I need more help in reading. 
I worry quite a bit about my reading in school. 
I read at home. 

I would rather read 'than watch television, 
lama poor reader. 
I like my parents to read to me. 
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In the ifiid^le grades, students could respond to„the follow- 
ing ten statements by circling agree, undecided, fcr ^disagree on 
their answer* sheets: • 

Rfeading is a good way to spend.sparje time* * 
Most books are too long and (dull. • " * 
* There shoved be more free reading in school. 
Reading is as important as watching television. 
* Reading is boring. 

Readiftg is rewarding to me. 
*I think reading is fun. * 
"Teachers ask me to read books.that are too hard. 
. I am a poor reader. 

My parents spend quite a bit'of tim§ 'reading*. 

Thfere are several questionnaires that have been published 
to help measure students' attitudes toward reading. Question-, 
haires that may be of interesUto teachers include: . * 

Askov, Eunice N. Primary pupH reading attitude inven- 
tory. Dubuque, Jowa: Kendall/Hunt,. 1973/ > 

Estes, Thomas E. A seal? to measure attitudes toward 
reading, Journal of Reading, November 1971, 15, 435-138. Fur- 
ther validation of this scale can be found in Kenneth L.,Dvlin and 
^Robert D. Chester, A validation study of the Estes attitude scale, 
Journal of Reading, October 197%, 18, 56-59. 

Heathington, Betty S., and Alexander, J. Estill. A child-' 
based observation checklist to assess attitudes toward reading. 
Reading Teacher, April 1978, 31, 769-771. • ■ " 

> LaPray, Margaret. Helping children to become indepen- 
dent readers. New York: Center for Applied Research in Educa- 
tion, 1972. 

Rowell, C. Glennon. Art attitude scale for reading. Reading 
Teacher* February 1972,25, 442-447. 

Tullock-Rhody, Regina, and Alexander, J. Estill. A seals 
-for assessing* attitudes toward reading irv 'secondary schools. " 
Journal of Reading, April 1980, 28, 609-614. 

Vaughan, Joseph L. Jr. A scale to measure attitucles 
toward teaching reading in content, classrooms. Journal of 
Reading, ^pril 1977,2ft 605-609. * . 
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Interest inventories. Most intei^sHnV^ a 
series of questions or.mcomplefee^entences that help teachers 
find out sufch things as students' likes, dislikes, hobbies, in- 
terests, family activities, and use of free time. The inventories 
can be administered orally or in written form. One of the major 
reasons for administering an interest inventory is to gain infor- 
mation to help use instructional techniques and bopks ap- 
propriate to students' interests and needs. 

One informal student interest inventory is in the form of a 
news story. Incomplete sentences help studertts write about their 
family, friends, pets, wishes, travels, .hobbies, television, and 
books. Several sample incomplete sentences include: 



*My father and I like to _ * 
I would'like to have a pet 
- 1 do not like ' 



My — , reads to me 

One of my hobbies is ; 9 



, In addition, some of the following questions may be useful: 
What kinds of books or stories do you like? \ 
What books or magazines do you have at home? 
Which comic books do you like to read? 

Numerous interest. inventories are available for use or 
adaptation by teachers in the following sources: 

Farr, Roger, and Roser, Nancy. Teaching a child to read 
New York: Harcourt Brace Jovanovich, 1979. 

Harrjs, Larry A., and Smith, Carl B. Reading instruction, 
3rd ed.,New York: Holt, Rinehart and Winston, 1980. 

Strickler, Oarryl, and Eller,' William. Attitudes and in- 
terests. In Pose Lamb and Richard Arnold (Eds.), Teaching 
Reading. Belmont, California: Wadsworth, 1980. * 

After the student has completed an inventory, the teacher 
can review and study, the responses to get clues about interest 
patterns. Because interest and reading preferences are largely in- \ 
dividual and subject to changf" teachers shouldmse caution in * 
drawing conclusions. • ~ J 
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Workbook^and worksheets. The many worksheets' and 
workbook pages included in most reading programs provide 
another means for assessing reading skills on*a regular basis. The 
workbook pages and worksheets are designedto provide practice 
with a particular reading skill, such as selecting the main idea ol 
el short passage. If that worksheet is composed o$a series of short 
passages and several statement^ lone of which is the main idea), 
the teacher can use the exercise as one method^ to determine 
which students may need additional instruction with the skill of 
identifying, main ideas. , - 

An advantage of worksheets and workbook pages is their 
accessibility. Carefully selecting appropriate workbcpok or 
worksheet exercises will provide the teacher with an ongoing 
means of assessment that; if properly used, can help evaluate the 
effectiveness of skills instruction. 

Other inforfnal measures. There are several additional in- 
formal means, oi gathering information to aid in assessing 
reading: cumulative records, student-kept records, and numerous 
otter informal tests. 

Cumulative records < are one means for developing a 
longitudinal view of the student's reading. Such records usually 
contaiiftest results (standardized and informal), observations by 
pr^vio^s teachers, health and family information, attendance 
records, books read, and special instruction that has been given. 
Although cumulative records sometimes contain vague and 
somewhat subjective materials,* they can sometimes provide in- 
sight for instruction or suggest an interest area that can be used 
to ijiotivate the student's reading. , 

Student-kept records can be initiated by teachers £o help 
the student keep track of books read, favorite stories, scores on 
workbooks/ worksheets, or progress in various learning centers. 
These records may provide insights into numerous areas.of the 
reading program. 

There are a host of informal tests that the teacher can con- 
struct to assess prereading, decoding, and comprehension skills. 
Some of these informal tests require little effort to construct 
while others demand several hours. For example, a teacher who 
wishes to determine which students may need instruction in the 
thirteen most common basic sight words (Johns, 1979b) could 
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merely prepare a card containing the words (a, and, for, he, in, is, 
it, of, that, the, to, was, you). As the student says the words, the 
teacher records responses on a simple record sheet. 

In the upper grades, informal tests of study skills and the 
ability to use the textbooks are sometimes constructed by 
teachers. Teachers in the upper grades may find the" following ' 
sources, useful for constructing infortnal tests that help evaluate 
whether students can profit from and effectively use content area 
materials*. 

K'arlin, Robert. Teaching reading in high school, 3rd ed. In- 
dianapolis: Bobbs-Merrill, 1977. * 

Viox, Ruth G. Evaluating reading and study skills in the * 
secondary classroom: A guide for content teachers. Newark, 
Delaware: International Reading Association, 1968. v 

Summary 

Various informal techniques have been described to help 
teachers assess students' readingjbehavior. No single method of 
assessment is sufficiently valid or reliable that it alone should 
form the basis of assessment. 

Teachers neecl to realize that informal tests represent one 
part of a balanced assessment program. Standardized, diag- 
nostic, and criterion-referenced tests should also be used.^ 
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WithoutQhis balanced instruction may become misdirected 
which, Jn turn, may work against helping students become effi 
cient and effective readers. All teachers use some of these infor- 
mal techniques. Teachers need to remember that informal assess- 
ment techniques are a legitimate means to gain insights into t 
teaching of reading. 

To help teacher&use the informal strategies described in 
this,articlt\ the chart on page 9 may be useful in showing some 
of the major areas in the reading program that may be eval- 
uated with informal measures. 
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Diagnosis by Observation 

Patricia Cunningham * * * „ " * ^ 

Wake Forest University * 

"Good morning, Ms. Joh.es. Thiols Johnny. Johnnyjust moved 
into our school district and he is going to be in yourthird grade/' 
The principal smiles and exj£s leaving Johnny with Ms. Jones. 
Ms. Jones puts an»ar*m around a frightened Johnny and leads Him 
into the classroom. She appoints two of her more capable, con- 
genial students to be .Johnny's special friends for the week and 
makes a special effort to* ensure that Johnny becomes a pa$t of 
the classroom as sodn 3s possible. Meanwhile, Ms. Jones wonders ' 
about Johnny. "On what level does he read and in which group 
should he beplaced?" "What skills Ijas he mastered and on which 
ones does he eurrently need to work?" "Is he a child who can 
identify words better than he can comprehend what he is reading, 
or is he a' child who has trouble identifying words but mgkes good 
use of those words he can identify?" 

' These and many other questions go through Ms. Jones' 
mind as She watches Johnny become acclimated to her classroom. 
How will she answer these questions? Perhaps there will become 
useful information- yi the recprds that will come from his old 
school. She may be ajble to gain additional information by giving 
Johnny some standardized or tepcher- made tests. The answers to' 
most of Ms. Jones' questions, however, will not magically appear 
as numbers on a score report or as rig^t and wrong answers on a 
test. The an9werS to most of her questions will appear as Ms. 
.Jones interacts with and systematically observes Johnny on a 
dpy-to-day basis. This system of diagnosis is sometimes labeled ' „ 
in reading\textbooks as "diagnosis* by observation." Teachers 
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refer *to it differently. "I don't know hojv I do it," replies a par- 
ticularly effective 'teacher, when asked how he knows which 
children need what when. "I just follow my intuitions," replies 
another teacher. "Doing wKfet comes naturally," says a third 
teacher. 

Good teachers knowthe reading, heeds of their students. 
The problem is that teachers often are unablq to articulate how 
they know and, consequently, are unable to share their talents 
with Others. I know this is true because, for many years, I was 
one of those teachers who "had a feel for what to do" and was at a 
losS to communicate this feelings© my principal, supervisor, or 
other interested teachers. For the most'part, principals and 
supervisors, seeing that I achieved good results with children, 
left me alone to do it in whatever mysterious way i could. Today; 
however, I don't believe I would be granted that freedom. Intui- 
tion is out. Criterion referenced tests, behavioVal objectives, and 
tfnanagement skills systems are in. The implementation of these 
"grand plans for reading success" was probably well intentioned. 
There are teachers who lack "intuition" and a feel for what to do 
next. These teachers simply could not provide each child with ap # - 
prepriate instruction b6caur.s they didn't know what to do. So, 
pencil and paper tests were designed and keyed to Objectives and 
materials. "Intuition" would no longer be a>eftuisite for go6d in- 
struction. One need only administer and score the'tests, prescribe 
the appropriately keyed lessons and administer some more'tests. 
This test, prescribe, test, prescribe, sequence could then be ccm- 
tinued over and over again. Unfortunately, these grand plans 
have not worked out as well in practice as one might .have ex- 
pected them to. While they seemed to provide a workable system 
' for the teacher who did not know how to use observation effec- 
tively, the system was far frofn foolproof. When a child seemed to 
have mastered skills, and still wasn't learning to read, the naive 
teacher still didn't know what to^do even When. test results were 
available. Worse, however, was the damage done to the teaching 
of the intuitive teachers. 'Unable to explain how what they were 
doing worked, intuitive teachers often felt forced to adopt sys- 
tems which seemed objective and precisebut which often proved 
to be less 'effective and time consuming. 

, e Occasionally, I have voiced my concerns to some of my col- 
leagues who are advocates'of skills management systems. Their" 
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response^ often are that theydhare my concern.that the intuitive 
teacher is being stifled and the^.knoy that no system^ even the 
one they are.advocating, is foolproof; however, they also maintain 
it is unreasonable^ expect them to accept an "intuitive" system 
,that can't even be explained. Although some teachers can "just 
do it/ -others can't: Can this latter group be taught to "do it?" 
The remainder of this article will be 4evoted to accepting the 
challenge pf explaining intuitive teaching/ which can also be 
called "diagnosis by ^observation. ^ 

Reliability and Validity of Diagnosis by Observation 

When^Vst maker or a curriculum specialist attempts to 
sell a test to teachers or administrators, the terms validity and 
^ reliability are sure, to Tbe bandiefd about. Often, these technical 
sounding terms are. accompanied by some impressive sounding 
number*. "This test has clearly established concurrent validity 
and a test-retest reliability of .88," is the kind of argument in- 
tuitive teachers.find hard to refute. Assessment must, indeed„be 
valid and reliable if A is gojag to help us make instructional' deci- 
sions. Intuitive teachers have a knack fof making their observa- 
tions valid and reliable even though they seldom use these terms, 
often cannot define them, and rf£ver have impressive numbers. 

A person is said to be reliable if that person can be de- 
pended on, time and time again, to do whatever h^or she is ex- 
pected to do. NunnMly (1967) states that, "Reliability concerns 
jJie extent to ^hich measurements ar^e repeaidble by the same in- 
dividual using different measures of the same attribute or by dif- 
ferent persons^using the same pleasure of^an attribute." Aft as- 
sessment measure is said to be reliable if it can be depended on, 
time and time again, to do what it is expected'to do. If a measure- 
is very reliable, it will yi.eld approximately the same results toclay 
as it will tomorrow or next week^. You can rely on the consistency 
of the response. Test makers achieve reliability in a number! of 
ways, one of which is by including man^ different items to mea- 
sure eaph skill. A student may miss one of the items due to«con£u- 
sion, inattention, or fatigue and get most of the. other items cor- 
rect. Another student may correctly guesp the answer to an item 
but show t^he tru^deficiency by responding incorrectly to the rer 
jnamder ofthe <tem3. K The scores of both students will be fairly 
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reliable if there are a number of different' items testing the same 
skill because the judgment is not likely to be made based on the 
one chance mistake or guess. 

The intuitive teacl^r achieves reliability in a similar man- 
ner. Judgments are never made based on one observation. When I 
taught first grad£, we played the "whisper game" every day to 
cenclude our reading group. I would present some task to the 
students: 'Today, I w^nt you each to whisper(in my ear a word 
that-begins with the letter ro." "This morning, I have a*nagic 
sentence on my magic slate for all my good readers to whisper to 
*me." "Whisper to me something funny that happened in today's 
,story" The task related to a skill I wa? working on and, as each 
child finished whispering, I would unobtrusively note the success 
f or failure on a checklist. I would never decidfrthat a particular 
child knew th^sou^d/letter correspondence for the initial conso- 
nant m orNcoul^eaX new words iA context or remember tnaj or 
eventg 'front ^ st\ry Based on one day's whispering. ^Rather, I 
would repeat \he tasks Xel&ted to a particular skill several times. 
I would eventually conclude that those childrpa who responded 
correctly to myltfcsk each time, or who only responded incorrectly 
once, were domg^well with the skill in question! Children who 
almost never responded correctly were identified and given addi- 
tional individual instruction. For the few children whose perfor- 
mance was inconsistent from one day to another, I would sit 
down with each individually and 'probe in more depth their 
'understanding or lack of understating of the skill in question. ' 

There are many other strategies besides the whisper game 
by which intuitive teachers achieve the goal of reliability. All of 
these strategies call for making several observations of each 
desired skill before making a decision. Intuitive teachers use 
every Rupil response for activities to increase student participa- 
tion and to "get a feel for" which students are learning, which 
aren't, and which need some one-on-one probing. Intuitive 
teachers ask students to write down 6n individual response to a 
though provoking question and take that written response to a 
small group which wilf discuss and perhaps even argue x over the 
.Varied responses. As the children interact in small groups, the in- - 
tuitive teacher is circulating and makitfg notes about the higher 
level thinking abilities of each student. Intuitive teachers .who 
have a long drive to and from school sometimes r tape^ the oral 



Wftv,w, U,r ni i.:_„ . f «U 

15 



^ ; °grf&is by Observation 24 



reading of setae of their children and then listen to these tapes* 
while driving, to or from school. , ) 

# These andj?iumerous other structured, planned, systematic 

observations carried 1 out by intuitive teachers allow these 
, teachers to make instructional decisions upOn which they can 
rely. Because these observations are carried out across several 
da> s or weeks or months, the judgments achieve a high degree of 
reliability. Unlike judgments based on a test which is given in a 
singly sitting, ongping teacher observations are not affected by 
day-to-day changes in students' physicaLhealth or emotional 
stability. ft . 

x The concept of validity is somewhat harder to explain than 
th^concept of reliability. Nunnally (1967) statgs^hat, "In a. very ^ 
genera* sense, a measuring instrument is vaHcfif it does what it is 
intended to do." Validity of measurfltfiem refers to the match be- 
tween the concept or skill^to be measured and the,n*eans by which 
it is measured. A measure is valid to the extent that it measures 
what it was intended to measure. This distinction may seem aca- 
demic and superfluous since we ought to be able to assume that 
any instrument will measure wjiat it is intended to measure. This 
assumption, however, is often questionable when one considers 

*N<he limitations of pencil and paper tests. An example should 
clarify these limitations. „ ' 

Imagine, for example, that a reading skill important for 
beginning readers to master is the association between consonant 
letters and the sounds commonly associated with these letters. 
Creating a valid paper and pencil test of this knowledge would ap- 
pear quite simple. One could create a test which contained some 

; pictures and ask students to write the letter they thought the 
name of the picture began with. Is this a 'valid measure of their 
fyfiGWledge? When the tests havfe been scored, will the teacher 
know which students have this initial'Consonant knowledge and 
which don't? The answers, "Perhaps!." Imagine, for example, 
that seme students don't know or can't remember the names for 
some of tfie pictures. Students who look at a picture of a dog, call 
'if a puppy, and write the letter p under the picture have the 
wrong answer and the.right knowledge. Imagine other children , 
who write the letter bnrnder the picture thinking they have the 
knowledge that th^ test was designed to evaluate. Other students 
may be able to spell dog, and write the letter d under the picture. 
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*If this response was generated by a memorized spelling, the cor- 
net response does not indicate that the children have^aChieyed. 
the desired sound/symbol correspondence. In order to create a 
valid test of students' consonant sound knowledge,' the test 
creator would have to be sure that most" of the stimulus pictures 
were familiar to^the children, called by the^lesired name, and not* 
fahailiar enough to have their spelling memorized' 

Y There is, however, a much more serious and insurmount- 
able obstacle to the test pr^edure described above, This is 
relatecf^o the issue of why it is desirable for students tp be able to 
associate consonant letters with sounds. This knowledge, in and 
of itself, is useless. It becomes useful only when thfc**tudent tan 
use this knowledge and the context of what is teing^eadL to 
decode an unfamiliar word. What we rpftfly want ^Udents to dote 
to apply their knowledge of letter/sound relationship* as th^y / 

^ read. The previously described test procedure is aimed'-at ptitina' 

r the student's letter/sound association knowledge, not the ^pplicaj 
tion of this in real reading. So>if the desiredsjcill is the ability to 
us/ethis knowledge in reading, how can this be measured validly? 
The answer to this Question is obvious' and simple. Put the 
Children in a "real" reading situation in which they can demon- 
strate their ^ability. * c 

Imagine that Mr. Jones, Master Intuitive Teacher, desires 
to know which students have learned consonant letter/sound 
associations and can apply them as they are reading. How would. 

, he find this^ut? He woujd probably carry out a lesson that 
looked like thjp: . ^ 

"Boys and girls, this morning I put some sentences on the 
board. While I wasn't looking, a leprechaun sneaked in and 
cpvered up some of my words with these shamrocks. He 
must want us to play a guessing game since this is St. 
Patrick's Day, a special day for leprechauns. Let's read 
each sentence together saying 'blank' when wexome to the 
covered words. Then let's guess which word the leprechaun 
covered up." (Students read the first sentence and make J 

• four or five guesses for the blai>k. J ' ' We certainly have a lot , 
of guesses. How can we decide which is right? Yes, we 
* could uncover the whole word, but Jook here, the sham- 

. rocks are cut so that the left corner of each comes off. The 
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leprechaun must want us to have sojne extra dues. Let's 
•tear just this jcorner off." (Corner is torn off revealing the 
initial donsonant of the word.) "Aha! He did giv % e us some 
extra clues. Now, which of our guesses are still possible? 
Yes, that word begins with an / so it could be lake but not 
. pond, ocean, or river. Let's try the ^est in the same way. 
First, without any clues we will guess how the word 
, begins. Then, we will make new guesses based on the clue* 
our leprechaun left us." 

The lesson continues and, once the sentences on the board 
are completed, each child is given a mimeographed sheet on 
which is written three sentences. Each sentence'hds a word with a 
shamrock drawn over all but the initial consonant. Students are 
told that the leprechaun }eft them each a surprise (Leprechaun 
picture to color and a puzzle are at the bottom of each sheet). 
Students are then asked to read each sentence, saying "blank" 
when they come to the shamrock and trying to figure out what 
will go in that blank that makes sensg and begins with the clue 
left by the leprechaun. When they think they know what goes in 
each blank, they can come up and whisper 'the responses to the 
, teacher and then telling no dhe else what their guess was, they 
can color the leprechaun ancj^mplete the puzzle he left for them. 

As the students whisper in Mr. Jones' ear, he makes notes 
about two abilities, the ability to use context to'come\p with a 
response that makes sense in the sentence and the abitity"ta4fce 
initial consonant letter/sound associations to figure out unknown 
words. As the morning goes on and Mr. Jones works with other 
grqups, he continues to structure lessons and tasks which allow 
him to diagnose by observing which students can do what. 

Because these observations occur as a natural part of the 
lesson, children are able to demons trate.tffeir true. ability uncon- 
founded by the anxiety, panic, and inability to understand direc- 
tions that^often result from the knowledge that one is taking a 
test. Because the teacher has structured the observations so that 
decisions are based on the correct or incorrect responses of the 
childretf, the teacher views these responds in an objective, un- 
biased way. Because these observations are always conducted in 
the context of "real" reading, the teacher can observe not only 
whether or not students have learned certain associations, but , 
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also whether or not they can apply what they have learned as 
they read. Intuitive teachers achieve validity by objectively 
measuring what they choose to measure in a natural context 
which simulates as closejy as possible the tasks children are ac- 
tually required to perform as they read. 

A Theory of Reading as a 

Guide to Diagnosis by Observation 

At this point, you are probably asking yourself a very im- 
portant question: How do intuitive teachers know what to mea- 
sure and when to measure it? How do they decide which children 
need to be evaluated? The ability to know what they need to 
diagnose for students at differing stages of reading ability is 
what, in my belief, separates and distinguishes the intuitive 
teacher from the nonintuitive teacher. Even though few intuitive 
teachers will admit it, and many are not even aware of it, in- 
tuitive teachers have a theory of reading. 

It is with fear and trepidation that I even mention the 
word, theory, much less assert that it lies at the heart of intuitive 
teaching. In teacher training and inservice, theory has become 
almost a dirty word. "Teachers need practical suggestions 
teaching strategies they can use tomorrow," we are constantly 
told. Theory is seen as extraneous to 'successful teaching and 
learning. 

The backlash against theory is an understandable and prob- 
ably a justified phenomenon In too many cases, teachers have 
been taught theory that had little or no applicability in 
classrooms. Even when the theory had applications to real 
teaching, the teaching of this theory was, like much teaching of 
reading, not taken to the application level. Professors assufned 
that teachers would be able to evolve strategies and implement 
classroom practices consistent with the theory they were taught. 
In many cases, this proved to be an erroneous assumption The 
saying however, that "there is nothing as practical as a good 
theory" is true if teachers have been taught how to apply the 
theory or if teachers have evolved a theory based on their careful, 
thoughtful, evaluation of the success and failure 1 of various 
teaching practices with children Ifc different stages of reading 
development. 4 



Now, many of ypu may question the statement-that intui- 
tive teachers have a theory of reading. You may want to test my , 
assertion by running out and stopping teachers on the way to the 
lunchroom and asking them to.state in KKTwords or less their 
theory of reading. The response y6u will get from these teachers 
will probably convince you that they not only don't have a^l^^er^ 
of reading but are hostile to the whole idea of theory and to you! 
If, however, you really want to know if a particular teacher has a 
theory of reading, engage that teacher in a -dialogue in which you 
ask questions such as. "What do you do when a child is reading 
and substitutes a word that doesn't change the meaning of what 
is being read?" or "Do children have to know letter xjMfnes before 
they can begin learning to read some words?" or '^Should a child 
who is in the fourth grade but isj£ading at the 2 2 level be allowed 
to point to the words^ while reading?" "What about finger- 
pointipg for a cRiftl reading in the preprimerS?"- Intuitive 
teachers have answers to these questions. The answers may vary 
borQ teacher to teacher just as the theory held by teachers may 
vary. However, intuitive teachers can answer questions relating 

* to how reading fs learned, which abilities are prerequisite to 
others, and which reading strategies are appropriate at various 
levels of development. This practical theory held by intuitive 
teachers is not a set of abstract constructs but rather a set of 
beliefs whiqh guide the intuitive teacher to ask the right ques- 
tions at the right, time. 
x Test makers also have'a theory of reading. This theory is 

evidenced by the.type of tasks included on the test. It is on this 

*basis that intuitive teachers and test makers/promoters often 
part company. In buying a test, one doesn't just buy the how of 
measurement, one buys the what. A test does more than provide 
a way of measuring. 5 A test, by its very being, determines what 
you will measure, whom you will evaluate, and when. If the 
theory of the test maker and the theory of the teacher are incom- 
patible, the teacher is lopked into not only testing but also 
teachingin a way which is "counterintuitive." 

A Balanced Program of Diagnosis 

You may have inferred as you read this article that I am 
unalterably opposed to any use of standardized or criterion 



referenced tests. Not so! Standardized tests serve the important 
function of giving us some information about the overall effec- 
tiveness of our reading program. I am not opposed to the use of 
standardized tests when the results of these tests are used as 
they were intended to be used-to make judgments about how 
groups of individuals are progressing toward meeting the vgriou^ 
curriculum goals. I am opposed to the misuse of standardized 
test .scores to maTce decisions about how individual children are 
progressing. Many test makers and test manuals will clearly 
state the the standard error of measurement inherent in the test 
renders the test results invalid as they relate to the progress of an 
-individual. 

I am also not opposed to the wise use of criterion and 
teacher-made tests when these tests are selected by the teacher 
who will use them. When a teacher selects a test whichVnl t give 
information about how various children are progressing toward 
achieving certain goals, the test selected is generally compatible 
with the 'theory of 'reading held by the teacher. Information 
gained from the administration of teacher-selected tests is apt to 
help th^t teacher make instruction decisions and support that 
teacher's intuitive judgments. 

What I arn opposed to is the systemwide, countrywide, or 
statewide imposition of a test package on all teacfiers. While the 
purchase and implementation of these neat, packaged, efficient 
systems hpld tremendous appeal for parents, supervisors, ad- 
ministrators, school board members, and legislators, a growing 
number of "master teachers" stand firmlyconvinced that these 
systems are hindering rather than promoting good instruction. 
These systems often represent a short cut. to the goal of improved 
reading instruction; the path of these systems may be shorter, 
but it may also be more hazardous; many could be lost to cliff arid 
gulley. * - 

Universally good diagnosis will become a reality when we* 
have universally good instruction. Such instruction can become a 
reality only if we have universally good teachers. This article has 
attempted to describe what it is that intuitive teachers do. 
Through preservice and inservice training, our teachers can learn 
effective instructional techniques and can come to develop a 
theory of reading. They can be taught to make valid and reliable 
observations and to select tests consistent with their beliefs. The 



training of intuitive teachers will not be a quick, neat, 
manageable process, Iput the product will be as effective and as 
lasting as the process was long and painstaking. While the path 
niay be longer, the coming home will be surer., C~* 
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The Quality of Reading Miscues 

ft W. Dorsey Hammond 
Oakland University 

A cental question in diagnosis of any reading performance is 
simply: How well does .the student read? The question is com- 
monly answered by the comprehension section of an inform 
reading inventory «r1, which typically reports the results in 
terms Of reading grade level. The means by which diagnosticiai 
have arrived at a level of comprehension traditionally has bee 
the answering of questions after oral or silent reading A retelling 
procedure sometimes has been seen as an alternative to the' use of 
question* and is currently being used and advocated by some 
diagnosticians. 

' There are two other informal* approaches commonly used to 
« .diagnose reading performance: A word recognition in isolation 
test, and word recognition in context test. The word recognition 
insolation score is.obtained by having the subject read lists of 
words of increasing levels of difficulty. An alternate procedure 
used with word lists is tb present each word in a "flash" and "un- 
tamed I mode. This method assumes that a correct response on a 
Hash presentation is representative of a child's sight 
vocabulary, whereas the "untimed" presentation is represen- 
tative of the reader s use of phonics. Despite the lack of empirical 
evidence to support the diagnostic utility of this procedure, it is 
nevertheless advocated by many reading diagnosticians: 

The word recognition in context score seems to deserve 
more attention since it reflects^ reader's performance with 
materials that are jar less contyfved and artificial than are lists of 
isolated words. The word recognition in context score is obtained 



by noting errors such as substitutions, omissions," and 
mispronunciations as the student reads orally. The results are 
usually reported as a percentage score that is obtained by 
dividing the number of wor.ds into the numBer of recorded errors. • 
For example, if the passage is two hundted (200) words in length 
and the reader makes twenty (20) errors, t;he score would be 
ninety percent. There are, however, problems with the scoring of . 
measures of word recognition in context. Diagnosticians haVe 
had difficulty deciding v what constitutes $n, error— such as 
whether self-corrections, regressions, and meaningful versus non- 
meaningful substitutions are errors. As early as* 1946, Betts 
pointed to the problem of determining just- what constitutes a 
reading error. Hams and Sipay (1975) used a checklist to include 
detail and to weigh oral reading- errors. Johns (1978) suggests 
counting all errors or miscues* and then ^subtracting dialect 
miscues, corrected miscues, and all Ijpcues that do not change 
meaning for a net score of significant miscues. 0 

Procedures such as those suggested by^arris or Johns 
which are designed to deal with the qualitative aspects of oral 
reading may be valuable, but they also seem somewhat over- 
simplified. The determination of whether, or not an oral reading 
error affects meaning is not a one-dimensional' consideration. For 
example, does the error change the meaning of the sentence? Is 
^an error meaningful within the context of the story, but not 
meaningful within a given sentence? In short, the evaluation of 
oral reading i§ a fairly complex activity that must focus on the 
nature of the errors, made and particularly upon the extent to 
which these errors distort the meaning of a passage. 

In recent years, Kenneth Goodman and his colleagues have 
built a model of how reading takes place; and they proposed a 
comprehensive diagnostic apppdach for looking at oral reading 
miscues based upon that model. Publication of the Reading 
Miscue Inventory (RMI) by Y. Goodman and Burke (1972) js the 
result of the work they have done in the area of analyzing oral 
reading miscues. 4 

* For the moment, the terms miscue and errors wilM>e used interchangeably to 
refer to any instance where what is read is different from the text of what is being, 
read. A rationale for why the term miscue is preferred will be developed in the 
next few pages. 
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The RMI is designed to serve as a diagnostic instrument for 
classroom teachers and clinicians; it is based on a model of 
reading which maintains that there are three cue systems used in 
jpadin^: semantics or meaning clues, syntax or grammatical 
clues, and graphophonics or sotind and visual cliifc^ £11 three cue 
systems are used simultaneously by a mature reader and all 
three cue systems interact with one another. Another important 
p§rt of the Goodman model stresses that meaning i&both the 
goal of reading and a means by which one reads and recognizes 
words. 

^Goodman's Model of Reading, with which the RMI is con- 
sistent, is based on the^tudy of hundreds of students reading 
orally from text. This intensive study of oral reading led to 
several conclusions: 

' 1. Errors or miscues are not random, but follow a pattern. 

2. Some errors or miscues appear to be more serious than 
others; in other words, some lead.to a loss of meaning, 
others do not. * v 

3. Errors or miscues are not the result of careless reading 
nor, in many instance's, the result of poor reading since 
good readers also depart J from text. 

4. The same words can be cued correctly in one setting or 
context and miscued in another. 

These findings led to the use of the term misci^e, which seemed a 
descriptively better term than the commonly used term error. 
The research (Goodman, 1970; Goodman & -Goodman, 1978) 
validated what <many reading diagnosticians already intuitively 
knew— that errors or miscues don't just happen and that some 
mis.cues don't interfere witii comprehensioh, and actually may 
enhance understanding, wnile other miscues reflect poor com- 
prehehsion. Nevertheless, <4he term miscue seemed a more ap- 
propriate terni since it sounds less judgmental.. 

The findings that miscues are seldom the result of careless 
reading and that words may be recognized in one setting and not 
in another suggest that the commonly held notion that a word is 
either known or unknown is not supportable. The findings further 
suggest thai the process by which a reader identifies words can- 
not be fully explained by traditionally defined frord recognition 
-skills. Jri ipany, or most, cases what aVeader produces orally is a 



result of the use of meaning, the anticipation of meaning, or of 
•syntactical clues ; 

It is the balanced use of the three cue systems (meaning, 
( syntax, and graphopRonics) that identifies the effective reader. 
For example, too much reliance on word recognition or on con- 
textual clues can be detrimental to the reading process. The 
following sentence illustrates the complex interactive nature of 
the three cues: 

Soon his three sisters and two brothers, would come^ 
home. . . 

. * David, a fourteen-year-old, read the sentence as:* 

"Sun, his thi?d sister and two brothers would come hcjpie 

soon." 

Traditionally, one would count- four errors in this 
sentence, however, much more is to be gained from a qualitative 
interpretation of Pavid's miscues. In effect, he seems to have 
t changed the word soon into the subject of the sentence and then 
he seems to have done what a good reader attempts t$ do: he 
changed three to third and deleted the s on sisters, which makes 
sense both semantically and syntactically. Merely scoring errors 
in this instance rather than noting a meaningful, cons tru6tion 
would penalize this reader for the use of a reasonable strategy. 

The insertion of soon at the end of the sentence is more dif- 
ficult to explain. My own interpretation is that it represents el 
simple insertion .motivated by a (iesire for closure. It^sg&ns 
unlikely that he suddenly remembered that he had .mispro- 
nounced soon and merely placecjl it at the end of the sentence. Nor 
does it seem likely that he suddenly swept his eyes back to the 
beginApg of the sentence and "corrected" his first miscue. If this 
were the case, Jie would have most likely ^ea^or paraphrased 
the entire sentence; which he did not. % 

Let's^examine another example from David's oral reading. 
He had great difficulty with the following sentence: 
And I, me, myself— I need a- place of my own. 
. David struggled with the beginning four worcjs of the sentence, * 
• vet most assuropily these are words he can recognize in isolation. 

(Here he encountered difficulty because^these^ords seldom ap- 
y pear .in this particular syntactic configuration. 

m A miscue inventory, as ^emplifieH by tlie Goodman and 
Burke RM^r^ttOWs^eT'^qualkati^e interpretation to help ex- 
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plain such reading behaviors. The RMI by Goodman and Burke 
(1972) a$ks nine questions about each miscue. They are: , 

1. Is it a dialect variation? 

2. Is it an intonation miscue? 

3. The extent of graphic similarity? ' 
. 4. The extent of sound similarity? 

5. The grammatical function of the miscue?-. ' 

6. Is the miscue corrected? 

< 7. The grammatical acceptability of the miscue (in context 
of prior and subsequent connected discburse)? 

8. The semantic acceptability of the miscue (in Context of 
prior and subsequent text)? j , * 

9. The extent of meaning loss? 

* Questions three and four deal with the(r)^tionships Mtweenlet- 
ters and sounds, questions five.and sevfen with grammar or syn- 
tax, eighrand nine with meaning. Question six is particularly im- 
portant because it asks if correction of the miscue wks attempted. 0 
The season readers correct a miscue is almost invariably because 

> it either "doesn't sound right" or ^doesn't make sense. M 'Atten- 
tion to grammar and meaning cause good readers to reprocess 
and, in effect, use good reading strategies to correct miscues. 
Miscue research stfbngly suggests that when good readers 
miscue, they err on the side of meaning; Whereas, poor readers err 
on the side of phonics, One of the strengths of miscue analysis is 
in the breadth of the instrument; namely, that* it$xs able to ac- 
count for greater incidence of reader departure from text. The 
RMI allows for interpretations that are not possible with m<Sre 
traditional instruments? * * % 

An example of a fourth grade student illustrates another 
reason for analyzing miscues* qualitatively. In a text of approx- 
imately 7^0 words this reader iAiscuedmore than .75 times. Yet, 
in the retelling of the text, about which he hatf limited prior 
knowled^4*e-was able to demonstrate a superb understanding 
of what he had read. His miseries were good miscues as opposed 
to bad miscues. One could a£gue, of course, that this reader is not 
a gofcd oral reade/, but one could not question the fact that he is a 
•very good reader, because he comprehends what he reads. 

*After using midfcue inventories for several years, I have 
reacho^l the following conclusions: / 

1. The evaluation of word recognition inJsolation is 'less 
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valuable than originally believed. The results of such 
evaluations are not particularly informative. 
2. The word recognition in coritext score, appears to have 
less value and validity than practice seems to suggest. 
,3. A systematic miscue inventory helps explain why 
students do what they do as they read. 

4. Reading miscue analysis suggests that many children 
rely too heavily on phonic .or grqphophonic clues. 

5. Miscue analysis provides insights into the reading pro- 
cess ifself. 

6. Miscue analysis has demonstrated the necessity of tape 
recording an oral reading performance in order to make 
reliable interpretations. It is impossible to code oral 
reading, for whatever purpose, simply by listening once 
£s the child reads. *• 

In practice, it is not necessary to administer miscue inven- 
jtories to all students. However, intensive miscue training fon 
om teachers and diagnosticians is strongly recommended 
siijf e such training will significantly strengthens^ ability of the 
teacher to draw diagnostic conclusions from observation during 
instruction. \ ^ 

In the history of diagnosis of reading performance, there 
have been significant advances made both in procedures and in- 
struments. This writer regards the Reading Miscue Inventory as 
the most significant diagnostic instrument since the populariza- 
tion of informal heading procedures, over thirty-five years ago. 
Because of its theoretical base r its focuS on the process of 
reading, its breadth in terms of accommodating all three cueing 
systems?, and its focus on qualitative performance, the RMI pro- 
cedure allows a much more adequate response to the question, 
"How well does a student read?*' * 
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New Informal Approaches to Evaluating, 
Word Recognition and Comprehension ' 

Morton Botel 

University of Pennsylvania 

There are some who behave as though reading is a product— the ♦ 
sum of the separate skills measured by tests. In contrast, there 
are those who think of reading as a process— an aspect of the 
learner's continuous Search for meaning. The second of these 
positions most closely represents the view of learning to read 
that will be taken in this paper. I believe that we enhance the 
9earch for meaning most productively thrpughVhat might bev. 
called holistic reading-learning experiences, that is activities 
which treat reading as part of a meaningful language experience. 
^ Examples of productive holistic experiences include regular daily 
periods of listening to literature, self-selectecFreading, oral com- 
posing, and self-selected writing. 

s Likewise it seems reasonable^ think that we encourage 
and increase the search for meaning by helping the learneHo get 
to^no^-the workings of language through the processes of going 
from the whole of language to the parts and, back to the whole 
again. Throughout these activities the emphasis is consistently 
on the learner's , construction of meanings. This process of going 
""iron the whole to parts and then back to the whole again will be 
referred to as a holistic/ analytic/ synthetic process. Examples of 
productive learning experiences in this mode include closure or 
cloze type exerciser, sentence making. (How many different 
sentences can be made by arranging and rearranging a selected 
group of words from a story?), word making (How many different 
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words can be made by arranging and reaiYanging a selected 
group of letters oi 5 letter patterns from $ story?),' and ^earning to 
study informational, material using ; a\9njfied stu^a^proach, like 
SQ4R It might be helpful to also illustrate activities that are nott 
holistic in nature, but: instead are illustrative'of how reading' 
skills are fractional. Learning an isolated 'group of sight^ words, 
learning how to "sound out' individual letters, studying "the" 
meanings of a list of words, and c|oiff| practice exercises on a so- 
called subskill of comprehension such as reading for details in a 
test-type format are all illustrations of nonholistic activities... 

As students are engaged in/holistic and holistic/analytic/ 
•synthetic language experiences, teachers can observe reading 
behavior, both word recognition and comprehension. This kind of 
observation' is often called diagnostic teaching. As students 
engage in holistic experiences, such as the reading and oral 
rereading of passages of material or the oral reading of original 
student writing, word recognition and Comprehension abilities 
can be demonstrated and observed in situations that are far le$s 
artificial and contrived than is frequently the case in evaluating 
reading skills. As Goodman (1979) has emphasized, and as Ham- 
mond points out in an earlier chapter of this volume,. an observa- 
tional focus on the reader as a searcher for meaning gives us a 
way of looking at errors or miscues in a way that a^ows the s 
quality of the deviation to be taken into consideration. Thus 
errors which do not interfere with meaning, such as (jorrected er- 
rors, dialect variations, meaningful substitutions, and insignifi- 
cant omissions can bre discounted as compared with refusals and 
misreadings that do interfere with meaning. Observations' of 
reading by teachers in classrooms over time clearly portray the 
learner in the most valid, reliable, and useful way. 

There is some .disagreement as to the minimal unit that can 
legitimately serve as the means of evaluating reading. In the 
following statement, Goodman and Page present their concerns 
with respect to evaluation using sentences and words rather than 
larger units of meaningful language. 

The isolated word list test strategy described here is com- 
mon in school systems. It is probably worse than no test 
strategy at all because the infdrmatien it yields is confus- 
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ing and misleading . . .ri^&ding is treated as though the 
performance of identifying isolated words by saying their 
sounds is the same as the reading process itself (Page, 
1979, p. 75). ' , 

Reading ^tests frequently establish minimal jading 
situations Which greatly impair the operation of one or 
more ,of the language systems. One common procedure is 
to introduce a sentence or short paragraph with one 
underlined tyord in it followed by several items, one of 
which is supposed to be a synonym for the underlined word 
• (fcoodman, 1979, p. 16). 

I respectfully both agree and disagree r with these 
statements. I agree that the fragmented view of reading men- 
tioned earlier places too much emphasis on isolated eleirients and 
so-called subskills unsupported by science. I disagree with the 
statements because most linguists, cognitive psychologists, and 
psycholinguists have concluded that being conscious of how 
language! operates is important in learning to read. This language 
awareness is developed when the learner works on aspects of 
decoding at the syllable, word, and sentence level. For example, 
in one analysis of the research in beginning reading instruction, 
Gibson and Levin (1975, pp. 323-324) concluded that the begin- 
^ ning reader already has skills of gaining information using syn- 
r ' tactic and semantic clues, but needs to develop conscious 
awareness of the relationship between letters apd sounds because 
"there is nothing in language behavior or other content previous- 
ly a^[«ired by the child that will transfer* to this aspect of the 
reading task." Their specific reference to the sentence and word 
as ah appropriate basis for instruction is:, "The child should en- 
counter sentences fron) the very beginning of'training, because 
the sentence is the minimal unit that 1) insures comprehension 
^"ancU^ ^provides all three types of informatjpn [semantic, syntactic 
andgriphophonic]. A differentiation model will be followed, that 
t is, the complete sentence will be introduced firsthand then will be 
broken dowji into its component parts.' - 

In another comprehensive review of research sponsored by 
the National Institute of Education to determine #*hat we know 
today about reading instruction, Weaver (1978) also concluded 
that direct instruction in decoding should be 'a primary focua of 
early reading instruction. 



41 • 

O 32 . ) . -Botel 



ERJC 



There are seven research findings that seem especially sup- 
portive of the use of instructional strategies based on a holistic/ 
analytic/synthetic mode of language experience. These findings 
also have implications for the evaluation of reading skills. These 
findings are: 

1. It is conceptually easier to learn to recognize words as 
representing meaning directly rather than as representing 
consonant and vowel sounds (Goodman, 1979; Rozin, 
Poritsky, & Sotsky, 1971). 

2. It is easier to recognize words in context than in isola- 
tion (Goodman, 1979). 

3. At the same time the search for meaning is encour- 
aged, attention of the learner should gradually be focused 
on the relationship between letters and sounds, sometimes 

# called the graphophonic principle. In learning this princi- 
ple, the syllable is a more concrete perceptual unit and, 
therefore, a more learnable unit than the phoneme (Gleit- 
man & Rozin, 1977; Liberman et al., 1977; Rozin & GleitT 
man, 1977). 

. 4 ' The young reader should acquire "a set for 
diversity," i.e., an understanding that a letter may stand 

• for more than one sound (Gibson & Levin, 1975, p. 324). 

5. The general ability to recognize words in isolation at 
the primary levels is highly correlated with the ability to 
recognize these words in context. Furthermore, the ability 
to read spelling pattern syllables and nonsense words is 
highly correlated with general reading comprehension. 
Such a correlation does not suggest that we ought to teach 
words in isolation or to practice decoding by using non- 
sense words (Calfee, Chapman, & Venezky, 1972; 
Shankweiler & Liberman, 1972). 

6. Vocabulary knowledge is highly correlated with 
general reading comprehension (Anderson & Freebody, 
1979). 

7. General reading comprehension cannot be reliably 
subdivide^into subskills, not even into the skill of deriv- 
ing explicit meaning and the skill of deriving implicit, 
meaning (Mason, Osborn, & Rosenshine, 1977). 

These seven research findings suggest principles for 
^ designing objective measures for determining: 1) the reading in- 
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structional levels of students and*4>ljnas£efy oftthe global pat- 
terns of English spellings repr^sentgdiym^ to recognize 
high frequency syllable/spelling patternNvwds fn lists or in 
sentences. , 

An Earlier Approach to Informal Evaluation 

A short history of my search for general measures for an in- 
structional level for reading and for a measure of mastery of 
English spelling patterns follows: 

In the early 1950s I developed three subtests that* became 
the Botel Reading Inventory (1978). With only one reading/ 
English consultant for Bucks County Schools in Pennsylvania 
(some 45,000 students) there was a pressing need to have some 
realistic ways for helping classroom teachers determine the in- 
structional levels of their students. Since almost all of our 
teachers used basal readers and since it was found that over 25 
percent" of our elementary school students were placed in basal 
readers at their frustration level, the placement of studfents at 
their correct instructional level became a very important pro- 
gram objective. 

After trying various approaches, a combination of a word 
recognition test (as an estimate of oral reading fluency) and a 
word opposites test (as an estimate of silent reading comprehen- 
sion) was found to be a valid, reliable, and useful battery f<?r the 
correct placement of students in basal readers. 

The Word Recognition Test included preprimer through 
fourth level word- lists sampled from the Botel 1180 Common 
Words (Botel, 1976), derived from a frequency study df the com- 
mon words in five major basal readers. The ability to read aloud 
correctly at least 70 percent of the words in a list was regarded as 
mastery at the indicated level. This score corresponded to 95 per- 
cent oral fluency in context (obviously context helps). Uncor- 
rected mispronunciations and refusals were counted as errors. 

The Word Opposites Test included first reader through 
senior high school words sampled from the Botel 1180 Common 
Words and the Lorge Thorndike Word Book of 30,000 Words. 
The ability to read silently and correctly identify at least 70 per- 
cent of the word opposites was regarded as indicating mastery at 
the indicated levd. 
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As a measure of mastery of the major patterns of decoding, 
The Phonics Mastery Test of the Botel -Reading Inventory 
sampled four levels of decoding: Level A-Beginning Con- 
sonants, Blends, and Rhyming Words; Level B-Vowel Sound/ 
Spelling Relationships; Level C- Multisyllabic Words;^nd Level 
D-Multisyllabic Nonsense WoVds. The first three levels cor- 
responded roughly with the graphophonic cbntent of typical 
basal readers at the time in graded one, two, three. The format for 
determining knowledge of phonics in Levels A and B was the 
ability to write the first letter or vowel sound heard in a word. I 
later came to believe that^the ability to read syllables was the 
more valid means for determining knowledge of the alphabetic*' 
system and the new Botel Reading Inventory (1978J represents 
that view- *». 

In addition, during the same time period, I developed a 
criterion system called the Cooperative Reading Checkout for ad- 
vancing students from one level of their basal reader to the next. 
Essentially this involved, a collaborative decision .between the 
teacher and the principal whenever the teacher believed the 
students had mastered a given level* bf the basal. Mastery was ' 
defined as reading in the last unit of the basal reader with at le*?t 
95 percent fluency in the oral reading of stories with at least 75. 
percent comprehension in silent reading. Oral reading errors were 
defined as mispronunciations or refusals. Repetitions, insertions, 
and substitutions were not regarded as errors for the purposes of 
' meeting the criterion if the x meaning of th<rpassage was not essen- 
tially changed. Comprehension was judged by average perfor- 
mance on workbook pages dealing with comprehension done in- 
dependently by the student at that level. 

Given one or both of these procedures, teachers were en- 
couraged to observe student -performance in daily tasks and - 
modify instruction and placement accordingly. 

These procedures still seem reasonably valid and most par- 
simonious with respect to the management problems of class- 
room teachers and in the larger content of ^comprehensive 
reading/language arts instructional framework?* 

In more receipt years as I continue my search for efficient, 
useful, and time- economizing ways of testing students, I 
developed and am researching two additional procedures for ob- 
taining criterion referenced measures of student reading com- 
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petence. These arerl) the Botel Milestone Tests (BMT), and 2) a 
procedure for developing a maze test for placement/mastery in 
any basal reader program. 

« 

Principles for Constructing the 
Botel Milestone Tests (bmt) 

The seven research findings cited earlier is this paper 
formed the basis for the derivation of four general principles on 
which the BMT is built. These four principles are: 

1. Tests of reading should be constructed so as to use the 
sentence as the prhyiary unit of testing since the 
sentence includes semantic, syntactic, and grapho- 
phonic clue§ and since the sentence is a convenient unit. 

2. -General reading comprehension skill can be estimated 
, - throu^ the use- of a vocabulary-in-context approach. 

Such an approach is recommended because no scientific 
* distinctions can be made in determining the subskills of 
comprehension and since a vocabulary measure is more 
predictive of general comprehension than any other 
measure of general comprehension. To put it another 
way, a vocabulary-in-context measure is not just a 
. vocabulary test, it is'the ideal surrogate testfor general 
comprehension. 

3. Two groups of tests should be constructed— one to 
determine the student's ability to recognize the most . 
cpmmon words in sentences and to decode the most 
common syllable/spelling patterns, and one to deter- 
mine the student's ability to comprehend increasingly 
difficult material. 

4. To achieve valijlity and reliability the words for the 
sentences should be randomly chosen or some other 

. means should be used to obtain a representative sample 
of words. Valid studies* of frequency of word use an<J 
semantic knowledge of students at various grade levels 
' • * should be used. The preprimer word recognition test 
should be developed from the Botel 1180 Common 
Words (1976), the decoding tests from the American^ 
Heritage Word Frequency Book (Carroll, Davies, & 




Rffchmohd, 1971), -and the advanced comprehension 
tests from Dale and O'Rourke ^Living Word Vocabu- 
lary mm. 

^ A brief description follows each ofthe subtests (aod two ex- 
amples of each subtest) of the Botel Milestone Test?Each subtest 
is lead silently. The mastery criterion for each subtest is 90 to 
100 percent, A score of 70 to 80 percent on a subtest suggests 
that subtest level is the student's instructional level. Sixty per- 
cent or lower is regarded as indicating frustration at that Jevel, 

FOUNDATION SUBTESTS 
DECODING/COMPREHENSION (1-3) 

Functional Subtest A: Decode and comprehend sentences composed 
only of words commonly found in basal readers at the preprimer level 
(e.g. little, jump, play). 

1 . Is mother in the up house me? 

- 2, I have a big blue hall xe<L ' fox. 

Functional Subtest B: Decode and comprehend sentences in which most 
words havie the regularly spelled cvc (short /twel) pattern (C = conso- 
nant, V = vowel), such as web, mad, log. * * 

1. Mom and Dad du£ invthe mud jU met, 

2, Pam sells her pot34hd her nods pans sits. 

Functional Subtext C: Decode and comprehend sentences in which most 
words have the regularly spelled cvce (long vowel) pattern such as cage, 
pipe, tide. 

1. Pe^te is hiding the tire in the wise cave cane, 
f 2. Jane rode the bike to the, lake ' mate ,wake_. 

Functional Subtest D: Decode and comprehend sentences in which most 
words have the semi-regularly cwc patterns, including {he vowel 
sounds other than long or short such as cow, toy, noon; the r controlled 
vowel such as tar, dirt, form; and the alternate (other than those in 
Subtest B) spellings of the long vowel sounds such as beef, tea, sail. 
# 1. Ray paid for the meat with the mood coins calm. 

2. At noon he will feed hay to the joy sigh. cows, 

* Functional Subtest E: Decode and comprehend sentences in which most 
words have the regularly spelled CCVCC (short vowel) pattern and ccvcce 
(long vowel) pattern (cc = consonant clusters, V = vowel) such as twist, 
crash, shade, slope. 

1. At camp Jack slept in a frog w tent hick. 

2. Gwen put the silk belt on the ' . g rade shine dress . 

Functional Subtest F: Qecode and comprehend sentences in which most 
words have the semi-regularly ccvvcc and ccvvcce patterns including 
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* the vowel souods other than long or short such as* mount, shoot; the r 
controlled vowel sound such as storm, march; and the alternate (other 
than those in Subtest E) spellings of th£ long vowel sounds such as 
brain, feast, 9 flight 

1. Mark spread the creamy cheese on the 
blow bread . burst. 

2. Jean fainted when she saw the doubt teach ghost. 

• *» 

ADVANCED SUBTESTS: LEVELS OF COMPREHENSION (4-12) 

Fijnctional Subtest G. Comprehend word meanings in sentence contexts 
at fourth and fifth grade levels. 

1. The teafcher admires good students. 

€ thinks well of settles down puts blame on 

2. His gambling feft him broke. 

like a boy very smart without any money_ 

Functional Subtests H, I, J, and K would successively evaluate the com- 
prehension of word meanings in senteaee contexts at grade levels: six 
and seven, eight and nine, ten and eleven, twelve and thirteen. The for- 
mat of the test would be the same as that for Level G; increasingly more 
challenging vocabulary items would be used. 




Procedure^ for Construction of a Clozure Type (Maze) 
Test for Instructional Placement 

I have outlined the second new 'approach I am proposing 
for determining a student's instructional level in any basal 
reader. An examrJe of such a test then follows. * * 

1. Choose e^ropy or a coherent part of a story (more than 
one, if you want alternate tests) in the last unit of each 
booklevel, beginning at the primer level (there are too 
few words in the preprimer). The story, segment should 
be approximately 100*200 words long. 

2. Pelete selectively 10. to 15 percent of the words in the 
. . story. Choose words for deletion that were introduced in 

the basal at that booklevel (words introduced at a 
booklevel are usually listed in the appendix of the book), 

3. Type this story, leaving a long blank line for each word 
deleted. This should be long enough for three words. 

4. On each blank line, type the word deleted and two foil 
words. Take the foil words from the list of words in- 
troduced in that booklevel or the previous level. 
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5. Have other persons read vour test to check for .ambi- 
guity, awkward language? or items showing a bias of 
any kind. . 

6. Have the test typed or printed in readable type. 

7. Before giving the test, instruct the students on how to 
take such tests; use sample paragraphs at a lower level 
of difficulty. Instruct the students to: 1) read the para- 
graphs or story segment all the way through, 2) 
sentence-by-sentence, choose the correct word from 
those on each line, and 3) mark the correct word. 

8. Consider a score of 80 percent to 90 percent on a test to 
be strong evidence that the student has mastered that 
level, i.e. can read books at the pext higher level at least. 
Consider the first level at which- the student falls below 
80 percent to 90 percent to be his instructional level. 

9. 'Administer this test near the beginning of the year to" 
place students in basal readers and administer 'it again 
to confirm mastery of each level when teachers believe 
students have successfully completed each level. 

10. When the teacher's observational judgment of a par- 
ticular student's achievement is not confirmed by the 
test, the situation should be studied further. One option 
is to give the student the test individually as an oral 
reading* test. Or the teacher might confer with a 

- facilitating teacher or an administrator who has also 
. ^observed the child. 

-2 2 Level 




name • . v/ v date 



BA^Y HERCULES AND THE GIANT SNAKES* * 
Little Hercules' mother tucked him and his twin brother into 
their crabs cribs drawer . "Good night,.babies," she said Ws she put out 
. ; the light. Then, just as she closed the bedroom door, two giant snakes, 
pythons, came thohgh throw through the open window. 

♦Frwn Explorations, jn 1 D.C. Heath, Reading,, by Norton Botel, John 
. ft Hawkins, and AIvirVGranowsky, 1973. |£ • ^ v 
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In the moonlight sunlight bedtime tha babies saw th.e snakes 
coming toward them. Soon the huddle huge h uggable pythons- ,would 
wrap themselves afraid arrow around the baby boys. 

''Momrn£! Mommy!" Hercules' brother cried. He tried tied told 
get out of his crib. Thei* mother, hearing the screams, ran quietly 
prickly queerly toward their room. * * 

"Mommy! Mommy!" Hercules' .brother .cried again. But the 
bahiV Hercules wasn^t a bit afraid. He stood up and wasted waited 
worked quietly And when the pythons came near, he grabbed them by 
■ their \iecks near next and held them. , 

\By the time their mother fcrjehed the door, she, saw the snakes 
thrashing their giant baskets babies bodi es back and forth. Baby Her- 
cules waa^ holding- onejn-$ach hand. 

Conclusic 

The mokk^tfffid, reliable, and useful approach for the infor- 
mal appraisal of word recognition and reading comprehension is 
observation of the reader reading in naturalistic settings over 
time. Documentation of oral and silent reading in self-selected 

Materials, reading one's own compositions, reading plays, and 
rereading .textual material portrays the many faces of word 
recognition and comprehension. 

In addition ^rchers and, schools typically want an objec- 
tive criterion-referenced measure of the student's competence in 
word recognition and comprehension. Given the unreliability <rf 

"tests of spbskills below the level of the syllable in word recogni- 
tion and below the level of general comprehension, diagnostic 
tests at these sublevels fail to meet the requirements of science. 
Instead, milestones tests of general competence in word recogni- 
tion and comprehension can be developed to provide helpful infor- 
mation for placement and for planning in^trucfefon, Several pro- 
cedures I have developed for this purpose represent some of the 
options available. * *^ 
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Although the cloze procedure \as originally introduced as a 
technique for assessing readability (Taylor, 1953), its uses as an 
evaluation tool for reading have expanded in many directions 
within the past two decades. The cloze procedure possesses the 
fpjtowing characteristics which are frequently associated with in- 
formal evaluation proceduiW for reading: 1) it can be teacher- 
constructed rather than being in published form, 2) it can be con- 
structed from materials that might be used for instructional pur- 
poses, 3) it uses preestablished standards to judge the adequacy 
of an individual's performance ^rattier than comparing 'an in- 
dividual's performance to someujormative standards, and 4) it 
can yield information that can be helpful in making decisions 
about the levels at which a student might best profit from in- 
struction*. 

For the reasons mentioned above, these authors feel that a 
strong case can be made for viewing the cloze procedure as an in- 
formal evaluation tool for reading. However, for those-wlio nar- 
rowly equate informal evaluation (clearly not the position in this 
volume) with informal reading inventories, the cloze procedure 
* will seem very different from informal methods. While mis are in- 
dividually administered, cloze tests *re "typically group ad- 
ministered; ^hile IRls take samples of reading material" from* 




texts without altering the form of that material, cloze procedures 
require changing the text by omitting^some of the words; while 
IRIs tend to rely heavily on oral readin&to evaluate word iden- 
tification or decoding skills, cloze evaluatioh materials tend to be 
v silently read; while IRIs assess comprehension through asking 
questions that tend tc^rely on both understanding of the material 
and memory for the material, cloze procedure assesses the ability 
to use grammatical and meaning clues to fill in the missing parts 
of a message. 

Thus, the position taken in this paper is that the cloze 
technique is one of the many informal approaches that can be 
taken to measure reading. The striking differences that exist be- 
tween cloze and informal reading inventories make cloze an ideal & 
compliment to a full informal inventory. Because IRIs are in- 
dividually administered, and they comprehensively evaluate 
reading skills through a variety of approaches, they serve best as 
highly diagnostic instruments that yield information useful for 
making judgments about individuaf children. However, the very 
advantages of iris immediately present a disadvantage— they ate 
very time-consuming to administer and require much diagnostic 
skill for adequate interpretation. In contrast, since cloze is a ^ 
group procedure it can be administered very quickly and effi- 
ciently to large numbers of students. The results from a cloze test 
are ajso interpreted in a fairly straightforward fashion; however, 
like most group procedures, cloze sacrifices the ability to make 
detailed diagnostic observation^ and it lacks the precision of 
results that can be achieved through individually administered 
instruments. For these reasons it is best to think of the cloze pro^ 
cedure as a screening device that efficiently and quickly yields 
results that must be viewed as tentative. 

In spite of a tremendous amount of research involving cloze 
as an evaluative procedure, as might be expected, there are still 
many unanswered questions about how effective the procedure is 
as a diagnostic tool and the form that cloze procedures should 
take. Some of the questions surrounding the effectiveness of 
cloze will be addressed in the later sections of this paper; the bulk 
of the paper will focus on the various practical approaches that % 
have been suggested for the use of this technique. 
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Description of the Cloze Technique 

The term cloze was introduced by Wilson Taylor in 1953. It 
is derived frcma the word closure, which is a concept bprrowed 
from the gestalt school of psychology. This school of psychology 
developed in an attempt to explain the complex phenomena of ' 
perception. The psychological approaches developed to that date 
seemed at a loss to explain the difference between what actually 
■ was occurring' (sensations) and what human beings tend to 
perceive. For example, what in reality is a series of individual pic- 
tures, rapidly shdwp and va rying only slightly, is perceived as a 
picture with movement. A mosaic is really a large number of in- 
dividual pieces .of material, yet it is perce#ed as a picture. 
Through a study of complex perceptual phenomenum, the gestalt 
psychologists arrived at a series of perceptual "laws/* one of 
which was the la* of "closure" which stated that, when a familiar 
object is presented with some detail lacking, there is a. 
ps/chological tendency to sap that object as a whole unless a 
deliberate attempt is made to find a missing part. 

Taylor reasoned that the same psychological tendency 
would exist with respect to written materials— that if there were 
missing pieces, there u<puld be a natural psychological tendency 
for people to fill in the gaps to £ry to achieve a complete whole. 
For example, given the sentence "I think I'll go for a walk in 
— there are a number of words that immediately come 
to nttind^yara, park, woods, water, night, sun. The familiarity of 
^the language and context? of the sentence create a'tendency to ' 
v ^ant to close or complete the sentence. 

X^In effect, cloze is a way of measuring how familiar the 
reader is with the language and content of the material to be read. 
From another point of view, i£/is a way of measuring the close- 
ness of the language and background of the author and the 
reader; the closer the match between the two, the easier the read- 
ing material will be. For examp4er > many of you reading this book 
woulfi have little difficulty in filling in the missing word for the 
sentence: "The combinations ch, sh, and th, are all illustrations 

of 99 Because many of you bring a backgound in reading 

with you, you probably are able to identify the missing word as 
being digraphs; however, given the sentence, "Annealing with 
a laser beam>was introduced in 1974 by Russian scien- 
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tists," many 4 of you might have trouble filling in the missing 
word pulsed because you're not familiar with the technology of 
microelectric devices. In short then, cloze is a method whereby- 
words are omitted from sentences and a reader is asked to fill in 
the missing words. The form that cloze takes will vary depending 
on the purposes for which it was constructed. 

Uses for Cloze 

Although there is .some evidence to suggest that the cloze 
procedure might be used as an alternative to standardized tests 
of reading achievement (see Bickley, Ellington, & Bickley, 1970; 
Rankin, 1974), most reading specialists 'and consultants tend to 
concur that it can be more profitably adopted for the following 
purposes: p 

1. To assess the readability of material While much of the. 
material in this volume focuses on the evaluation of the 
skiljs a reader possesses, there is constant concerp in the 
field of reading about evaluating the difficulty of 
reading materials as wel^ The most frequent apprpach 
taken to evaluating the difficulty in readability,, of 
materials is the use of readability formulate. However,' 
most of the popular readability formulae appear to have 
serious limitations and most tend to rely solely on some 
measure of word difficulty (either number of syllables in 
the word or the fact that a word is not on a list of com- 
mon words) and on sentence complexity which is usually 
measured through sentence length. Because of the 
restricted nature of these formulae, it is impossible for 
^ them to measure factors such as the use of an unusual 
meaning for a common word, highly symbolic language, 
awkward and confusing sentence structure, the rate at 
which new ideas are 1 introduced or the use of illustra- 
tions to support the development of an idea. In effect, 
readability formulae tend to be rattier static, over- * 
simplified approaches to trying to-measure the difficul- 
ty a piece o£ material will present for a reader. One other 
important consideration is that readability formulae fail 
' to consider the background of experience that a reader 
possesses for reading a particular piece of material. As' 
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noted earlier, most readers of this chapter will be Jar 
more successful reading materials related to the field of 
reading than in reading about microelectronic tech- 
nology. Cloze has the distinct advantage as an approach 
to readability of allowing a direct assessment of the ip-' 
teraction of readers with reading materials. All of the 
factors which will make a 'piece of material easy or dif- 
ficult will influence* whether* or not an individual W 
group of people will Wable to fill in words that have 
been deleted from a selection. In his original work/ 
Taylor (19^3) found a high rate of agsggmerit between 
readability formulae and cloze in rankifig*th£ difficulty* 
of three passages which appeared to c<*tfain few of the 
factors beyond word length and sentencefeigfti that in- 
fluence readability; however; when he compared the 
rankings of three passages which deliberately' violated; 
some of the assumptions underlying the use of common' 
foniiiilae, cloze proved a*more adequate measure of 
rejRability. 

The use of cloze as a measure of readibility will 
probably remain limited because it doqs not yield the 
traditional grade level sctores that many teachqrs and 
reading specialists have come to expect. Through the 
use of cloze scores; passages or "Books can be ranked 
from easiest to most difficult but not categorized as 
fourth or sixth reader level There is a roundabout, way 
for assigning a grade'score based on clozei but this will 
be explained later in the. chapter afterthe discussion o,f 
the construction, scaring, and* interpretation of cloze 
tests. v - • <V 

2. To place students in basal-read^ series and other types 
of graded,* instructional materials. When cloze is used 
for tjiis purpose, it is probably most like an iri in func- 

* tion, The test materials can be Constructed by selecting 
one or more passages from each of the texts'that are be- 
ings considered for instructional use. By administering 
selections from several reader levels it becomes possible 
to estimate a student's independent and frustration 
levels. (Criterion scores for determining functional read- 
iftg levels are presented and* discussed latei* in this 
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chapter.) The Aiffereittes between the use of cloze for 
estimating an instructional level compared ^to diag- 
nostically establishing an instructional level from an iri 
has already been mentiQned. . ~ 

There is, however, some research which allows for an 
estimate of the amotint of agreement likely between es- 
tablishing an instructional level from cloze as compared 
with an iri, There is evidence (Cunningham &* Cunn- 
ingham, 1978; Jones & Pikulski, 1974) that the two will 
Bgree 70 to 80 percent of the time. Thus, cloze procedure 
seems a very reasonable screening device for instruc- 
tional placement in reading. 
3. To evaluate the appropriateness of content area texts. 
Given the practical utility of the cloze procedure as a 
technique for placing students in basal readers, it is 
perhaps not surprising that content area teachers are 
often encouraged to use this procedure to select texts 
that a*e written at an appropriate level of difficulty or to 
try to identify those students who may need special help 
or support in order to be able to profitably use a text. 
However, ^while it seems reasonable to assume that con- 
tent area teachers should have no difficulty adapting 
cloze to screen their students for instructional place- 
ment, it is usually inappropriate to assume that it will 
enable them to place their individual students in texts 
written at their instructional levels. Even when teachers 
have the option of placing their students in more than 
one text, lit is often necessary to choose a text that is 
poorly suited to the reading ability of a large number of 
students. This is partially due to the fact that students 
are usually assigned to content area classes without con- 
sidering either, the range of differences in their reading 
ability or the range of materials that are available to the 
teacher. But it is also a matter of practicality: texts dif- 
fer in their content and organization, and'teachers who 
are expected to teach a variety of classed simply lack the 
time to vary their instruction accordingly. 
'Thus, in many instances, cloze can be recommended 
only with the expectation that it will help content area 
teachers select texts appropriate for the majority of the 
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students in their classes. However, it may also suggest 
the need for some small group work involving discus- 
sion of vocabulary and concepts central to the cohteftt^ 
to tie read before this group of students is expected to 

* complete reading assignments, especially, when the as- 
signments are to be completed independently. t 

A major advantage to the use of cloze in content areas 
lies in the fact that it is relatively simple to construct, 
administer, and score. Many content area teachers 
simply do not have the background to allow them to use 
more complex assessment procedures.' There is also the 
obvious advantage of allowing for group administration 

" of this procedure. . , 
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Types of Cloze Tests " 

Conventional^ random deletion ctozp tests. As suggested, 
the purpose of the evaluation often determined how a ooze test 
might be best constructed/administered, and scored. When tlie 
purpose of the evaluation is any of the threejust discussed (to 
'measure readability, * place students, in appropriate reading 
materials, or assess'readers* Abilities to dope with theidemands of 
content area texts!, teachers are usually advised to use a random 
deletion cloze. In this form of cloze test (prpbably the most com- 
monly used form), every fifth word is" deleted, regardless of the 
word. In some variations of this proceduce, it is recommended 
that every seventh or tenth word be omitted? The random dele- 
tion form of cloze has been the subject of a great deal of research, 
and is very eas£ to construct, administer, apd score. * * ' - 

Construction of random deletion cloze tests. Recommended 
procedures for constructing a random defetion cloze test are a> 
follows: , j * 

1. Select a passage of approximately 250-300 words. This 
passage should appedr jto be representative or typical'of 
the content of the book. If the bpok* becomes progres- 
sively more difficult, try to select a passage from* the se- 
cond quarter of 3ie book. 

2. Inspect the* passage to insure that it is not heavily 
dependent on information presehfed earlier in thfc text. 
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If it contains a number of anaphoric words and phrases 
(e.g., it, this, the§e, points) which have referents found 
only in earlier sefcfions of the text, another passage 
should be selected.' 

3. ffeep intact the first and last sentences. 

4. Randomly choose qne of the first five words in the sec- 
ond sentence. Beginning with this word, omit^yery fifth 
word until 50 words have been deleted. A word is defin- 
ed as any group of letters set off by spaces. Thus, a 
number such as 1980 should 'be deleted as if it were a 
single word. However, hyphenated words are generally 
considered two separate words, 'except in instances 
where the prefix cannot stand aldne as in co-opt. 

5. Replace the deleted words with blanks of uniform 
length, and number each of the blanks consecutively. 

6. Prepare an answer sheet that the students can use to ' 
record their responses. 

To illustrate the form that a random deletion cloze test takes, 
the next paragraph of this chapter is written in the form of cloze 
test. It is #n abbreviated-form in that it has 15 rather than the 
recommended 50 blanks. If you have never taken a cloze test, yoy 
might find it interesting to try to fill yi the missing words. You 
would probata find it helpful to list the numbers 1 to 15 on a 
piece of paper and write next to each number the word you think 
should be inserted in the corresponding blank. The omitted words 
are listed at the end of the "paragraph. * 

♦ The decision to delete every fifth word is based on botb 
# « * practical and empirical considerations. In addition tp 
the^ _L that it can be JL adopted, to construct a JL 
number of test items A. a reasonably short passage, JL 
• « Taylor (1956) and MacGinitie JL have provided evidence 
- \ to _L that an every-fifth-word _8 pattern provides the 
» maximuip JL of context necessary to IS reliable re- 
sponses. Leaving more 11 four words between the 12 
blanks had no effect 13 the restoration of mis'sing 14 
and, thus, no apparent 15 The obvious disadvantage to 
% ' using every sixth or seventh, etc., word is that students ~ 
need to read longer passages in order to respond to 50 
deletion items. * 
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Omitted words are: 1. fact 2. easily 3. large 4. from 5. 
« both 6. (1961) 7. suggest 8. deletion 9. amount 10. elicit 

; 11. than 12. cloze 13. on 14. words 15. advantages % 

* Using a 250-300 word passage is also a recommendation 
based on experimental evidence. Generally, a 250-300 word; 
50-item cloze passage can be expected to yield a reliability coeffi- 
cient in the neighborhood of .85 (Bormuth, 1975). Although this 
seems sufficient for most of a teacher's purposes, it must be em- 
phasized that an individual's score on a test having a reliability 
coefficient of less than .90 cannot be interpreted with a great deal 
of confidence. At best, it represents only an estimate of the 
students' ability to deal with tfifc demands of the material, 
however, it seems sbmewhat fmpractical.to recommend that the 
teacher select a longer passage since it would be necessary to dou- 
ble the number of test 'items in order to raise the reliability tea 
minimum of .90 (Bormuth, 1975). <► 

■ In addition, a cloze test based On a 250-300 word passage 
has several practical advantages. A passage of this length hap- 
pens to fit comfortably on a single sheet of paper and, as a result, 
it is likely to encounter less student resistance than tests con- 
structed from longer passages. Also, it is easy to calculate 
percentage scores on a 50-item test— simply multiply by 2. 

Administration. If students have had no previous ex- 
perience with cloze, it is advisable to do a practice exercise. This 
exercise should b.e constructed from materials that are fairly easy 
for the students to read and should consist of approximately 10 
iteins. To conserve time and facilitate group discussion, the exer- 
cise may be written on the board or displayed with tfn overhead 
projector. 

Instructions will differ depending on the age of tHte stu- 
dents and the type of cloze test that is to be administered. When 
a random deletion cloze is being used, the following directions or 
some variation might be used: 

Some words have been left out of these sentences. Your job is to 
fill in as many of the missing words as possible.' Some of the later 
sentences may give you clues about the earlief ones. The best 
thing to do k to read through all o£the sentences first, and then 
v go back to /the beginning and try to fill in the blanks. Only one ' 
word goes in each blank. 
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After -the students have had sufficient time to*read the 
passage, individual students Should be asked for possible 
answers. Any answers that are meaningful and syntactically cor- 
rect should be accepted. This should help students recognize that 
there are, at times, several reasonable choices for filling in the 
blanks. If an .answer does not seem reasonable (e.g., it instead of 
they, when a plural noun is the referent), the clues that might be 
used to help the students choose a more appropriate response 
should be discussed. 

Once the introductory exercise has beerT cdrnpleted, copies 
^"~of th$ cloze test should be .distributed to each of the students, 
along with the following advice: ? 

Although this exercise is similar to the one we'have juat com- 
pleted, you will very probably find ft more difficult. No oSSs ex- 
pected to answer all of the items .correctly. Try to choose the 
words vou feel' best complete the sentences, and remember to 
write ohlx_oije-wprd in each blank. You may skip hard blanks and 
come back to them when you have finished. If you are not sure 
how a word should be spelled, give it your best try. Wrong spell- 
ing will not count against you. 

If numbers, contractionspor hyphenated words have been 
deleted from the passage, the students should also be given some 
representative examples of the types of answers that can be used 
to fill in the blanks. Although students should be encouraged to 
work as long as they please, the teacher may want to s6t a time 
limit when it appears that Wir ^forts are no longer productive. 

Scoring. One of the mbst seriously misunderstood aspects 
of using a random deletibpi^cloze is that students should be given 
credit -only for answers that ar£ exact (verbatim) replacements of 
the missing words. Words with spelling errors may be considered 
correct as long as it is evident the student intended to write the 
word originally deleted. But no credit should be given for 
synonyms or other types of substitutions {girls for girl, walk for 
walked) even though they* may seem somewhat acceptable. 

The decision to use verbatim as opposed to synonym scor- 
ing is based on a considerable amount of experimental evidence 
as welUa's practical considerations. A number of researchers 
(Gallant, 1964: McKenna, 1976; Miller & Coleman, 1967; fyiddell, 
1964; Taylor, 1 953) have compared'exact replacement scores \yith 
various types of synonym counts and have concluded that thejat- 
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ter are not worth the extra time and effort. Synonym counts tend 
to yield slightly higher correlations with other measures of 
reading comprehension (Gallant, 1964; McKenna, 1976), but they 
also tend to be less reliable since they are based on subjective 
judgments of what is and what is not an acceptable response. In 
addition, it appears that there is absolutely no advantage in giv- 
ing credit fof synonyms when the purpose of the evaluation is to 
obtain an estimate of students' abilities to meet the demands of 
material. Although synonym counts tend to yield higher scores 
than exact replacement counts, correlations between the scores 
derived from the twc^types of techniques usually exceed .95 
(McKenna, 1976; Miller & Coleman, 1967). Consequently, it can 
be assumed that students will be ranked in almost exactly the , 
same way, regardless of the manner in which their answers are 
scored. If synonyms are accepted, the teacher or specialist wjll be 
forced to require a -much higher percentage score as a standard of 
acceptability, thus the student really achieves no advantages. 

However, the primary reason for recommending that the 
teacher accept- only exact replacements is really very simple: 
There are no available guidelines for determining the students' 
functional reading levels when more subjective scoring proce- 
dures of accepting synonyms are adopted. As might be assumed, 
the criterion scores derived are based'on the assumption that the 
student has been given credit only for answers identical to the 
words appearing in the original passage. Somewhat higher stan- 
. dards would obviously need to be established if synonyms were 
considered acceptable responses. 

Interpretation. Several different strategies have been 
adopted in attempting to derive criterion scores forjudging stu- 
dent performance on random deletioh* cloze exercises. Bormuth 

(1967, 1968) u se d' two types of measures— examiner-constructed, 

multiple-choice comprehension tests and an expanded version of 
the Gray Oral Reading Tests— to determine comparable indepen- 
dent and instructional level scores on random deletion exercises. 
The results suggested tnat cloze scores of 57 and 44 percent ac- 
curacy correspond to comprehension scores of 90 and 75 
percent— the comprehension standards that traditionally have 
been adopted for evaluating independent and instructional 
reading levels. Rankin and Culhane (1969) replicated Bormuth's - 
work in a study comparing cloze with other multiple-choice tests 
and reported very similar results. 
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^ * § However, Bormuth (1968, 1969, 1971) eyid others have also 
^ provided some evidence t# suggest that these criteria may not be 
the mdst appropriate standards to adopt when other variables 
6 are taken into consideration. For example, when Bormuth <1968) 

• w * -used oral: reading accuracy as the criterion for determinin^com- 

• ^parable l»els of cloze performance, he found that sohiewhat 
lojver cloze scores might be used as the standards for judging in- 
dependent aptf instructional reading levels. The results of this 
analysis indicted that cloze scores of 54 and 34 percent were* 
comparable to the conventional standards for independent (98 

^percent) arid instructional- level (95 percent) perfojznance on 
^measifres of word re<;ogi)ition. Similarly, studies cpftparing stu- 
dent performance on iris and cloze exercises havegenerally sug- 
gested somewhat less , stringent criteria for converging ploze 
- scdrfes h)*p equivalent functional reading levels. Both Ransom 
, (196§Mnd Jones and Pikulski (1974) concluded that thfe following 
^V^ teriaeappr0ximpte fc he' results obtained on iris: independent, 
•Taboye 45 percent! instructional, 30-45 perceut; and frustration, 
less} than 30 percent. 

■ % % To confus e the issue even more, we.feel we should point out 
that Bormuth (1971) has also conducted research which suggests 
( that highej standard? need to be adopted if the amount of infor- 
V niation to be gained as well as the novelty of the material andthe 
^ * students' willingness to study and rate of reading are, to be con- 
' sidered m selecting appropriate reading materials. In order to 
maximize the valufe of each of these variables, Bormuth sug- 
S gested that^plozfe scores should fall within the range of 49 to 59 
.percent when the material is being considered for instructional 
purposes. ' . , 

Obviously, much more research need s to be^qnducted to 
rearoivd some ofTKfdfscrepancies in the criterion scores that have ' 
heen suggested. Hpwever, for the teacher's purp&ses'it appears 
that the following g-iteri^might be adopted as reasonable start-, 
. t ing points in evaluating student performarice on random deletion 
cloze fests: v % * * * 
^ lndeifendei}t*l€vel: Students who obtain cloze. scores of at 
*Teast 50 percent should be able to read* the material with 
relative ease. Nq teacher guidance, should be necessary, 

— Consequently, this material* should be appropriate tor 

homework- assignments qnd otjier types of independent 
projects. • 
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Instructional level: Students scoring between 30 and 50 
percent should be abl^to use the material for instructional 
purposes. However, some guidance will be necessary to 
helj> them master the demands of the material 
Frustration level; Students having scores of less than 30 
- percent will usually find the material much too challeng- 
ing. Since there is almost no potential for success, the . 
material should be definitely* avoided* 
While these criteria are suggested as beginning points, 
teachers and reading specialists may need to adjust them based 
on the experience they have with cloze. Again, there is some 
evidence to suggest that more stringent criteria than those listed 
above should be used, 

- - • * At this point it seems appropriate to point out that when 
using cloze as a measure of readability one could roughly attach a ' 
grade level designation to a "text if a cloze test were administered 
to a jgrqup'of individuals of known reading ability. For example, if 
it were known that a group of students were reading at about a 
fourth grade level and if on the average they scored about 45 per- 
cent on a cloze test from a particular book, one could say that the 
book was about at a fourth grade level. 

Additional Uses for Cloze * 

» A modification of the conventional, random deletion cloze 
is necessary when cloze is used for purposes other than the three 
previously discussed. For example, cloze can.be us&d to evaluate 
student mastery of content area instruction or for diagnosing* the 
student's ability to use various types of contextual clues. When 
the cloze procedure is used for these purposes, it is usually recom- 
mended that the teacher prepare an exercise in which the words 
are deleted on a rational rather than a mechanical basis,*'Ex- 
amples of r^tionpl deletion cloze exercises will follofr. v 

In addition to the fact that only key words are deleted, ra- 
tional deletion cloze tests differ in that synonyms are now ac- 
cepted as correct. If the exercise is to be used to diagnose the stu- 
dent's strengths and/or' weaknesses, or to assess how much 
he/she has learned, the teacheV would certainly want to accept 
.synonyms for scoring a rational deletion cloze. Verbatim scoring 

- is necessary only when one wishes to establish functional reading 
.levels, to assess readability, or to evaluate content materials,' 
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Some of the uses and forms that rational deletion, cloze * 
tests can^take follow: ' 

Using cloze to gain a clearer perception of the student's 
ability to use contextual clues as Un aid to wortf recognition. 
Although much more research nteeds to be conducted to deter- 
mine its diagnostic capabilities, it is generally assumed tfiat cloze 
can be used to gain insight about the student's ability to use 
various types of syntactic and semantic clues. When it is adopted v 
as a diagnostic tool, it is recommended that th* teacher construct 
separate exercises, for 'each of the skills that are of interest such 
as recognition of pronoun antecedents, subject-verb agreement, 
and semantic relationships. Selected portions of exercises that . 
might be used to assess some of these skills are presented below: 
Pronoun antecedents:. Bill and Henry loved to play tricks 

on sister. Once Bill mailed Judy^ a large, blank cailvas 

and asked to enter it in the local art sliow. Judy was . 

surprised wh6n saw *what planned, to enter, ' 

- t i V ' 

Semantic relationships: The koala bear is one o£the most 

helpless of all wild Whenever there is any sign iff 

danger, koalas become >ery / . Usually,. they. climb tO ( 

• ( the of a tree and stay thettTiintii everything seems' 

again. ' 

» . > « *> 

Thp diagnostic utility of thSse and Similar exercises 
depends, of course, on the care with which the exercises are con* 
t structed and the student's responses are analyzed. As these il- 
lustrations suggest, th^ words- to be deleted need to b6 carefully 
selected in light of the specific purpose of the exejrcise. For exam- , 
pie, if the teacher is'interested in evaluating the student's ability 
to use^ pronoun antecedents as a clue to word recognition, only 
,those pronoun$|hat seem to have definite antecedents should be* 
deleted^Th^ deletion of pronouns that dp not have any iden- 
tifiable referents will not only make the exercise frustrating for 
'the student to complete but t will also increase the probability of 
making an inaccurate diagnosis. 

Similarly, %seems critical Xhat teachpr^ select passages 
which will enable them to construct exercises having relatively 
large numbers (25-50) of-deletipns. As in all forms of evaluation, 
the confidence one can place in the results depends on the length 
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of themeasure: the greater the number of deletions, the easier it 
is to assume that the exercise will provide an adequate sample of 
the students skill in using a particular type of contextual clue. 

> In addition, it is suggested that teachers select passages 
Written at the student's independent or, at most, instructional 
level This should reduce the po ssib ility of concluding that the 
student needs to becometmore^fwarS of particular types of con- 
textual information when the difficulty lies only in the student's 
ability to recognize the words appearing in the exercise. 

Finally, we feel we,should again emphasize that it would be 
inappropriate to evaluate students' needs for additional instruc- 
„ tion simply by comparingtheir responses to the exact words 
deleted from the material. WMe reproduction of the exact word 
is required when cloze is used to screen students for instructional 
placemen^, it clearly does give students credit for responses that 
reflect their ability to use various types of contextual clues. For 
example, the first blank in the passage about koala bears might 
be appropriately clozed by a number of words— creatures (the 
word that was actually deleted), animals, or bears. 

Unfortunately, there are no established* guidelines for 
determining which responses should and should not be con- 
sidered acceptable. Nor are there any guidelines to indicate how 
many of the students' responses need to be considered acceptable 
in order to assume they have mastered the skill being evaluated. 
While these are issues that -are not easily addressed, they ar€~' 
hardly peculiar to the cloze procedure. Teachers who rely on other 
types of informal techniques (classroom observations, teacher- 
constructed mastery tests) to diagno'se their students' strengths 
and weaknesses are constantly confronted with the problem of 
"^fablisHfiig appropriate criteria for instructional masteryT" ~ 
A Using cloze to assess, student mastery of the content of a 
particular instructional unit. Although there is virtually no em- . 
.pirical evidence to suggest that cloze can be used to evaluate the 
effectiveness of content area instruction, it seems reasonable to 
assume that it can be used for this purpose if the following 
precautions are 'adopted. First, thJ teacher should either select or 
prepare a passage which summarizes the concepts under con- 
sideration. Second, only key or important words should be 
deleted. Third, the students should be given credit for synonyms 
and other types of substitutions which are- indicative ^f their 
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understanding of the concepts that are being evaluated. To the 
extent, that the students' errors are not simply a reflection of 
their ability tojecognize the words appe^riag in the exercise, the 
results should help tKe teacher assess the overall quality of the in- 
struction that has been provided as well as the need for review 

and reinforcement. * 

* 

Additional Variations of the Cloze Procedure 

Maze technique. One of the more popular variations of the 
basic cloze procedure was first suggested by Gallant (1964, 1965). 
She reasoned that it would ]be preferable to use a multiple-choice 
format* with young children since they might have difficiffty 
recording their answers on conventional cloze exercises. Guthrie, 
Siefert/Bumham, & Kaplan (1974) also suggested the use of a 
multiple-choice doze procedure to monitor growth in reading and 
to guide the selection of reading materials. This procedure re- 
quires the reader to choose the words that constitute the most 
sensible path through a verbal "maze"— hence, the term, maze 
fechni<i)fue. The following illustrates the form a maze test might 
take: 

evaluation for 

"Cloze is one / l fl hel that has been recommended t( ? 

* procedure , him 

prefer * 

a wide variety of purposes." 1 . 

errors 

The maze technique has both advantages and disadvan- 
tages. Its primary advantage is that children find it less difficult 
and hence less objectionable than conventional cloze. Likewise, it • 
requires less time to administer. Its disadvantages are £EaFTFis~~ 
much more time-consuming to*constru<jt and has been subjected 
to far less research. , ^ 

Guthrie and his colleagues suggest that the teacher adopt 
the following guidelines for constructing, administering, and 
scoring a'maze test: 

1. Select a representative passage ^approximately 120 
words in length. '» 

2. Replace every fifth word" with three alternatives. These 
alternatives should include: a) the word originally de- # 
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leted; b) a distractor which is the same part of speech as 
the deleted word; and, c) a distractor that is syntact- 
ically different from the omitted word. (No guidelines 
, have been established to indicate how close in meaning 
, the distra^tors should be to the correct choice. For prac- 
I tical purposes we suggest that, ,where possible, the 
( teacher choose the distractors by scanning a previous 
" page of the material being used to construct the maze, 
and selecting words that fit the criteria for being a^ 
distractor.) * 
. Vary the alternatives so the correct answers do not 

appear in the same position throughout the exercise. 
. Distribute copies of thej?xercise to students and have 
them circle the correct choices,' * a « 



5. Give students credit only for the selection gf exact 
replacements. ' 

The criteria to be used in interpreting the results of a maze 
exercise seem somewhat more tentative than 'the standards that 
have been suggested for random-deletion cloze, Guthrie et al, sug- 
gested that "if a child is performing at about 90 percent accuracy 
for tbree or four administrations of the maze, more difficult 
material shoulcf be introduced; Optimal teaching levels are abqut 
60 to 70 percent accuracy (p, 167)." Thus, an independent level on 
the maze would be 90 percent and above, an instructional level 60 
to 69 percent, and a frustration Jevel below 60. However, Pikulski 
and Pikulski (1977) have provided some evidence to suggest that 
these criteria may need to be raised wh^n they are used with 
regular classroom students. In a study comparing the maze 
scores of 61 fifth graders with teach^* judgment's of students' 
functional reading Ievels^hey found tjiat the maze technique 
overestimated students' reading ability more than 45 percent of 
, the time. These results differed significantly from those obtained 
in a preliminary study* (Pikulski, 1975) conducted with reading 
disabled students attending the University of Delaware Labora- 
tory School. When woFking,with reading disabled students, the 
standards recommended 'b^ Guthrie et al. appeared to be ap- 
propriate, j * 

Similarly, the reliability and validity of the maae technique 
have not been well-established: To our knowledge, .only three 
studies have addressed these issues, Guthrie (1973) used the data 
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collected on a group of 36 children, ranging in age from seyjjfi to 
ten years, and found that none of the internal consistency relir 
ability coefficients for each of seven passages fell below ."90. He 
also reported a correlation of .85 between maze "and the Gates- 
JMaqGinitie Vocabulary Test and a correlation of .82-between 
maze and the Gates-MacGinitie Comprehension Test. Similarly, 
Bradley, Ackerson, and Ames (1978) reported moderately high 
correlations among alternate forms of the maze, constructed by 
r different teachers and administered to second graders. tHowever, 
Bradley and Meredith (1978) also concluded that it may be inap- 
propriate to use maze for assessment at the intermediate and 
jipior high levels when it is administered in its typical format, 
a study of fourth, sixth, and eighth'grade students (N = 335), 
ey found that the cloze procedure tende'd to place subjects 
either at the instructional or frustration level, while parallel 
forms of the maze produced, a c^iUag^e^fect, placing students 
predominantly at the independent revel' To increase maze score 
variability and, thus, its reliability and overall ability to detect 
differences in reading achievement, the investigators suggested 
that the following modifications be considered: " a) discarding 
the option type (i.e., distractor) utilizing a syntactically incorrect 
word; b) devising new option types (e.g., semantically correct 
within sentence but sematitically incorrect within passage); c) in- 
creasing the number of options per item*' (p. 188)/ 

Post-oral reading cloze. Another variation of the basic cloze 
.procedure is the* post-oral reading cloze test developed by Page 
(1977). This type o*f cloze test is constructed in exactly the same 
manner as the conventional, random-deletion cloze. The only dif-, 
ference is that students are asRed to read the intact passag^ 
orally, be f ore they are ad min ister ed the cloze material. Page sug- 
gested that this 'procedure provides a valuableTink between the^ 
evaluation of oral reading and reading comprehension. Based on 
Page's research, one. should expect a post-oral reading cloze to 
yield scores that are 10-20* points higher than conventional! cloze 
scores. * 

Limited cloze. In an attempt to provide teachers with a 
m<^e appealing alternative to conventional cloze, Cunningham 
and Cunningham (1978) have developed another type of multiple- 
chorce procedure. This procedure is called limited cloze because it 
differs from the conventional, random deletion procedure only in 
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one respect: the deleted words are randomly offered and „„ 
above the passage, providing the student with a limited numt 
of choices to insert in the blanks. Words deleted more than once 
are listed at the top of the test as many times as necessary ^and- 
students are informed'that etfch of trhe words in theiist can be 
used only once to fill in the blanks. As in the conventional pro- 
cedure, only exact restorations of the original words are scored as 
correct. " v 

The Cunninghams have suggested that limited cloze has 
several advantages. In addition to the fact that it is easy to con- 
struct and administer, it also has the advantage of reducing some 
of the resistance often encountered "when teachers are asked to 

/use verbatim scoring procedures. Also, limited cloze avoids the 
problem of developing -appropriate distractors-an issue that is 
often raised tfdiei* the maze tephrfique is adopted. And finally, it 
appears thaiymiteffcloze is as valid and reliable as conventional, 
random-deletion cloze. In two separate studies, the Cunninghams 
found that limited cloze tests yielded substantially higher 
internal-consistency coefficients than conventional cloze 
passages: .85 versus .64 in one study, and .90 versus .70 in the 
other They also found that limited cloze scores correlatedmore 
highly with the comprehension subtest of the Iowa Tests of Basic 

% Skills than did those obtained with conventional cloze although 
the difference in the validity coeTficients was not significant. 

The major limitation of the limited cloze is that it yields 
scqres that are' not easily converted into functional reading/ 
levels. In their preliminary work with the limited cloze, the Cunn- 
inghams found that it yielded an instructional range of 60 to 81 
percent in one studyand a range of 73 to 93 percent in another. 
JVhether a reliable instructional range can be established remains 
an issue for subsequent research. 

Conclusions . f 

The forms cloze tests can take and the uses for which they 
can be employed vary considerably. Hopefully, this chapter has 
pointed to ways in which this procedure can be flexibly used as an 
informal evaluation tool. 'While cloze has been^ subjected to 
-substantial research during the'past two decades, teachers and 
reading specialists must recognize that it is.'not a totally reliable 
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orwalid way to measure^pading skills but, certainly, this fs a 
caveat that could be applied to virtually every otKer approach to 
evaluating reading. Overall, doze appears to have many advan- 
tages, so that it seems reasonable to conclude that it can be used 
profitably as 6ne approach to informally evaluating reading 
skills. • \\ 
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Informal Diagnosis of Content 
Area Reading ^Skills 

T. Stevenson Hanstll * * 

Wright State University 

Educational diagnosis, the act of learning about skills that a^stu- 
dent possesses prior to instruction, is a step which logically 
, fpllows a clear description of the goals a teacher holds for a class of 
students. Goal setting, making decisidfas about concepts and.pro- 
cedures that students should master, must setve as the basis for 
sampling aod evaluating student behavior. It is on the basis of 
established goals, and through diagnosis, that a teacher makes • 
such important decisions as what to include; how to relate new 
concepts to past experiences; the rate and sequence of instruc- 
tion choosirig instructional p jk eduKes that lead to effective and 
efficient learning; and, finally^Mfccting materials that will con-' 
tribute to the achievement of goalk The establishment of goals 
^ must be the step before diagnosis siftce-jusing testing instruments 
•.before goals clarification will fragment the instructional effort. . 
Once go^ls have been established, however, a teacher can in- 
telligently select and design informal procedures which will 
measure studen#performance in re|ation'to selected goals. Thus, 
diagnosis' is an intermediate step between the description of long 
range goals and the development of short-term objectives. 

Instructional goals are established on the basis of a 
teacher's knowledge, philosophy, attitudes, and abilities, as well 
as teaching environment.' Most likely, the goals adopted by the 
content teacher 'will vary little regardless of student skills and 
abilities. Therefore, informal diagnostic measures should serve as 
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valuable tools in allowing teachers to understand' the skill levels 
that-'Students bring with them to content materials. 

A distinction is frequently drawn between diagnosis and 
evaluation. Diagnosis often refers to some type of formative *or 

•preinstructional information collection, whereas, evaluation more 
commonly describes postinstructional information collection. 
However, the distinction is necessarily a somewhat artificial one 
because of the need for ongoing diagnosis to make needed ad- 
justments in short-term instructional objectives. Pogtinstruc- 
tional evaluation should not only evaluate the student's previous 
learning, but it should also suggest needs for future instruction; 
in other words, frequently evaluation and diagnosis will take 
place simultaneously using the same information. The distinction 
is not better clarified if one examines the types of Jasks or 

. materials used in diagnosis or evaluation. In fact, the proposals 
made in this chapter suggest tjie use of content-relevairt taaka - ^ 
and texts as the basiS of both diagnosis and evaluation. 'Pie trye 
difference separating diagnosi^and evaluation is the purpose for 
which they are done. Diagnosis should help with future instruc- 
tional decision making, and evaluation should measure the ac- 
complishment oh previous goals and objectives. Because of the 
similarities iri methodology, the terms diagnosis and evaluation 

\will be used interchangeably in this chapter. It is left to the 
reader to decide the purposes toward which these techniques will 
bemused. . , * s 

The purpogje of this*paper is to show that reading goals are 
often content'specific; t?he accomplishment of some reading goals^ 
are integral to the overall goals'of the .content. 

Learning to Rpad and Reading to Learn 

When content teachers have a list of behaviors a student 
cannot v do (identify sequence, form hypotheses, and identify root 
w(frds), a frequent reaction is, "Send these kids bapk to me when 
th'ey can read/' This reaction is understandable when viewed 
fr6m' the perspective of a teacher whose long range goals treat the 
deficient skills as only peripherally' a part of his or her respon- 
sibilities. • ^ 
,To understand the role of reading instruction in the content 
area classroom it is important to distinguish between the'goals of 



ERIC 



6 4 • 73 



Hansell 



the reading teacher and those of the content area teacher. The dif- 
ferences between teacher goals for content reading an3 for basal 
reader reading are partly differences of emphasis. When students 
read a basal reader, the teacher more frequently" focuses oh such 
concepts ^as word recognition and comprehension. That is, a 
reading teacher focuses on teaching each student how to under- 
stand words and passages. In cpntent area instruction when stu- 
dents read a textbook, teachers fr.equ$ntly focus exclusively on 
the product of reading or what to understand. Thus, when assign- 1 
ing content reading, teachers sometimes disregard the howfo* 

Because of the different go'als of authors who' write oafeal 
readers and authors ^ho write texts, there are clear differences in 
the vocabulary the authors use. Basal readers are generally writ- 
ten to contain only those words used frequently in written and 
spoken language. Most publishers have stepped away from strict' 
vocabulary controls, but most bpsal series still rely on a core of 
words which are presented, repeated, and reviewed for several 
years. On the other handrauthors of content and reference books 
select words to represent the ideas they want to communicate. 
When the idea or concept is new to an individual student, the 
words will be new also. Thus, the vocabulary of the content text 
is both more varied and more technical than that of the basal 
reader. • • * 

While reading teachers should provide instruction ih how 
to identify important vocabulary and how to make sense pf text, 
it is appropriate thafcthe content tlacher keep the major foeus on 
the ideas of .the content. A content twfcher may encourage.the ap- 
plication of context, structural analysis, phonics and dictionary 
skills, but the use should be restricted to the*needed vocabulary 
terms. Similarly, tfie content teacher may work with students to 
"outline a passage, but the focus will remain *0n the ideas 
represented within the passage ^ith outlining seen as a means to 
this end. 

A second difference is that informational materials are 
written in a different style than stories. Such stylistic differences 
(i^e., less narrative, more exposition in content materials) assume 
different reader purposes and they entail differences in the„ tasks 
of, understanding. For example, the narrative style of a story is 
inappropriate for mathematical thinking. Similarly, \he listing 
style of a recipe is inappropriate for learning to think about 
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history as Reflection of human behavior. A goal of content 
teachers, related to this difference of style, is that students learn 
how to think about a specific topic. To accomplish this goal some 
reading instruction must proceed from materials which 
legitimately require students to think about the content of in- 
terest. 

> * 

The expectations a student develops from extensive ex- 
posure to story or narrative style writing lead a reader to select 
details about characters, events, and ideas in relation to an 
abstract idea of plot (Sfein and Nezworski, 1978). The expecta- 
tions of a math, scj^j*^ or unified arts teacher, however,' are that 
frequently each step must be mastered as a firm foundation for 
successive concept construction. For a student to be successful 
!^ S Jf must learn a- new mental set or scheme (Fredericksen! 
197o) of expectations for different writing styles along with a 
plan, for recognizing writing style and the topic before .he/she 
reads. In short, each student must learn to exchange the treat- 
events-lightly-build-meaning-from-sequences style of reading 
which is appropriate for basal stories, for the stop-reread-learn- 
and-then-proceed style which is necessary to undersold., new 
ideas presented in content reading. ™ 

Given the differences of vocabulary and style, and their im- 
paction the purposes and processes of reading, teachers find that 
students require guidance in the comprehension -of content 
reading. This guidance or instruction is appropriate within con- 
tent classes where the teacher goals include, first, an understand- 
ing of the content and, secqnd, an increase in each student's abili- 
ty to read in the field of study. Diagnosis based on a teacher's 
content area goals can Clarify what instruction, including reading 
instruction, will be helpful for each student. Thus,. diagnosis is a • 
means to the end of better pupil understanding. 

v. 

Approaches to Measuring Comprehension ; 

Once a teacher has established clear goals for concepts and 
attitudes he/she wishes to address, iLis appropriate to examine 
ways to measure what a student already knows and how well that 
student can understand a printed description of new ideas. 

However, the problem arises in selecting appropriate infor- 
mal measures because of a l'ack/>f agreement about what com- 




# prehension is and how it $an be measured. Simons (1972) has" 
discussed seven differenfapproacheg to measuring comprehen- 
sion. Three of the seven (the measurement, factor analysis, and 
correlation approaches) will not be 'discussed here because they 
deal with formal or standardised tests, The remaining approaches a 
include: a) the readability approach, b) the skills perspective, c) * 
the introspective report, and d) the /models approach. 

Readability approach. The concept of readability is attra^ 
tive for its simplicity. Readability ratings are an attempt to 
* somehow measure the difficulty of a book. Since every teacher 
has watched a youngster flounder through some book, it makes 
sense to try to find a book that eaoh student can read and under- ' 
stand without so much effort that the student quits before com- 
pleting the task. However, the p/actice of readability measure- 
ment does not work as teafcnfera would hope (Hansell, 1974", 
1976a, 1976b). V § 

Formulae such as those by-Daie and Chall (194§), Fry . 
(1968), and McGlaughlin (1969) focus on things whiqh can be 
easily counted in a book— letters, words, syllables, sentence 
length, affixes, and so on. These countable items reflect less than 
fivfe percent of what people have said make a book easy to read 
(Gray &'Leary, 1935). As a result, it would not be surprising to 
find that a book with a rSadability rating of 7.3 is rapier to* read 
than one rated 6.8 for some students. Similarly, obviously all 
seventh grade students will not be able to read and understand a 
book with a readability rating qjf 6.8. As teachers know, students 
differ on any, dimension we choose to measure*. There is no 
guarantee that we can match readability levels pf books and stan- 
dardized reading test scores accurately. 

keadabiiity ^does not- provide information about how 
youngsters will read a text or their familiarity with the topic 
treated in the text (Kintsch & Vipond, 1978). Comprehension, 
however, is the^ result of a meaningful interaction between the 
student and- the text. In this process both student and text are 
jLmportant. Readability formulas may serve to sensitize teachers ' 
to examine books more closely, but if they lead them to ignore the 
students' approaches to this particular text their use might.be 
destructive. A better way to determine the readability of a text 
might be to use some of the informalprocedures described later in 
this paper. 
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Skills perspective. In contrast to readability formulae 
which focu9 on the text, the focus of the skills approach is on each 
student. The skills of content reading have been defined by 
analyzing classroom tasks and, thus, are practical. The skills ap- 
proach to informal evaluation of content reading ability is most 
common by far; virtually every text on elementary, content, or 
secondary reading includes a list of so-called skills. 

These lists range from what Herber and Riley (1979) call 
the "simplest form" of 1 ) vocabulary, 2) comprehension, and 3) 
reasoning, to a composite list which includes the following topics: 

identify main idea of paragraph 
identify main idea of selection 

summarize , * * j ? 

outline / 
put ideaa in sequence 

details of paragraph or passage/grasp directly stated details 

locate information 

make inferences 

follow directions 

drfaw conclusions 

appreciate character 

understand setting 

recognize author's purpose 

identify-»attitu f .«js t^t the author is trying to convey 
identify words t^t author chooses to achieve purpose^ 
define keywords 

define words in content , 

syllabify 4 f , * 

accent 

identify meaning of affixes 
identify meaning of roots 

use synonyms, antonyms 1 « # 

choose best definition from dictionary 
sense variation among words 
. identify part of speech ( * 

recognize sentence structure 
recognize pai^graph* structure 
see relations$fmong ideas in passage 

time and^ace— Events O 

main ideiT— details • • * 

compare—contrast 
*" . hierarchy 

cause— effect 
apply theoretical information 
apply ideas 
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determine relevance of ideas 
determine accuracy of information 
think through the passage/anticipating outcomes 
organize ideas 

\ using parts of a book v 

contents t "* 
index 
glossary 

introductory paragraphs 
biographical data 
note taking 

use of card catalogue f 
use of dictionary 

knowledge of indexes and abstracts 

The skills perspective generates such informal evaluation 
instruments as observational checklists, placement test (coor- 
dinated with materials such as workbooks, kits, basal readers, 
taped programs) and a wide variety of ^teacher-designed tests. 
Guidelines provided by Burmeister, 1978; Shepherd, 1978; 
Strang, 1964; and Thelen, 1976 reflectthis type of approach. With 

variations they tend to suggest that teachers administer a group . 

inventory including 20 to 35 questions about: 1) the Ipook in 
general (size, shape, color, length, organization, into chapters and 
units); 2) parts of the book; 3) vocabulary (which may be from the 
dictionary, knowledge of synonyms to define terms, and use of 
context); 4) word recognition (limited to syllabication, accent, and „ 
meaning of roots and affixes); and, 5) comprehension and rate of 
reading. Burmeister suggests five questions about each of three 
comprehension categories: details, main ideas, and questions 
which require students to interpret and use information from the 
text. Shepherd would add questions about sequence of events and 
^ drawing conclusions to which Strang would add organization of 
- details and following direptions. 

Simons' critique (1971) of the skills approach to com- 
prehension seems to apply to" the skills approach to evaluation: ~ * 

1. There is confusion about what can be called reading 
comprehension. Most observers would probably agree that the 
ability to relate ideas in a passage is vital to reading comprehen- 
sion and, therefore, should be evaluated. But what about skills 
such as notetaking or selecting the best definition from a dic- 
tionary? These skills, although useful, would probably generate a 
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greater amount of disagreement among professionals. Obviously, 
those skills which are thought to be a part of reading, and of the 
, * content area subject, need to be taught, and should be evaluated, 
evaluated. 

2. Another criticism leveled by Simons is that skills lists 
often contain global or poorly defined terms. This points out that 
one. person's ''recognizing a sequence of events" is another per- 
son's "recall of details." Various authorities might agree that a 

. particular type of question should be included in an assessment; 
.but because these skills lists are the'siibjective products of arm- 
chair logic, there are going to be gaps, overlaps, and disagree- 
ments over terminology. 

3. Simons' final criticism is that there is no distinction be- 
tween the product of comprehension (outlining) and the processes 
by which the product is achieved (identifying main ideas). 

4. The skills approach suffers from other limitations as 
well. Another pf oblem with the skills approach is that such tests 
Jnay fragment the process of learning unnecessarily. If, for ex- 
ample, a student can outline a passage from a text, he/she can ob- 
viously^entify main ideas, locate details, (draw conclusions, and 
perceive the organization of ideas within the passage. Conversely, 
if a student cannot outline a text, there is little evidenoe available 

y that working on one or more of the skills mentioned above ,will 
transfer directly to the task of outlining.^The assessment of con- 
tent area reading should probably begin with more global tasks 
„ (completing a recipe or outlining a passage) and then'become 
more specific in intent if students are unable to successfully com- 
plete the task.* * . 

5. A final limitation of the skills approach is its focus on 
the student as opposed to the text or task. As has been noted, 
comprehension is best described as the product of a meaningful 
interaction of a student and text. This approach suggests to some 
that skills lists refer to genuii& internal abilities of the individual 
which have little to do with a specific task or* content area. Such 
percepfcoits often lead to'^ssessments that are irrelevant to the 
task of interest. For exampVthe teacher who relies on the results 
of a general vocabulary knowledge test to specify which students 
are apt to have difficulty with the technical vocabulary in the bio- 
Jogy text might be badly misled. Certainly, botlrthe general vo- 
cabulary and the specific technical vocabulary fall within the sfrea 
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of vocabulary knowledge, but this skill would not be expected to 
generalize. The best content area assessments Require students to 
carry out tasks similar in complexity to what will typically be ex- 
pected of them, using similar materials, and their performance oF 
these tasks is then used to suggest what instructional steps are 
necessary to accomplish the instructional goals. 

Introspective report. In contrast to the product oriented 
skills perspective, the introspective approach focuses on what the 
student has done to reach the goal, on what a student feels is easy 
or difficult aheut reading, and on epeb suud^nt's study habits'. 
Strang suggests that after students have read an assignment in 
class they should be asked questions such as: 

What did you do to get the main idea? 
- What did you do to remember tlie details? 
What did you do when you met a word you Jlidn«t know? 

« , , , * 

While introspectivfe questions may help a teacher gain in- 
sight about esph student's readingfthe^act of introspection is hot 
without problems. A basic question raised by Simops concerns ^ 
the relationship between the actual process of identifying main , - 
ideas and the verbal description. The same process may be de- J 
scribed in several ways, but a change in description does not 
change the actual process. Students n?ay also describe different 
mental operations by using the same word§i » * 

Introspective accounts are also retrospective. That is, stu- 
dents are asked to describe the reading process after they have 
read. Perhaps, introspective statements (of how main ideas were 
identified, for example) are influenced by the fact that the passage 
has been completed. ^ 

The introspective perspective is different from the skills 
perspective in that it focuses oi\the process of understanding as 
"opposed to the products or answers. At the same time, the intro- 
spective hieasures may deal with many of the same aspectsr&fv 
content reading (main ideas, details, and vocabulary). As with the ' 
skills approach, the teacher gains the greatest information when 
requiring students to complete tasks with the specific content 
materials of interest. « 

The models approach. The models perspective of compre- 
hension differs from t^e other approach' in its attempt to inter- 
relate what may be described as separate skills and to explain the 
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interrelationships of the comprehension process and product. 
Models of comprehension frequently take the form of-flow charts 
or diagrams (Singer & Ruddell, 1976). One of the most explicit 

* models of understanding discourse is a computer program 
designed by, Winogard (1974) which can follow directions, and 
answer questions. Emphasis on models of comprehension is an 
attempt to come a step closer to the goal of developing one or 
more theories of how people understand information. The models 
approach suggests viewing student behavior in a variety of 
reading-thinking situations on the basis of the relationships and 
concepts that are part of a specific model of reading. This ap- 
proach treats reading comprehension as a global, integrated act, 
and not just as a? set offunique and diverse skills. The difficulty 
with a models approa^to diagnosis is that teachers must 

' understand a theoretical model before they cap use it to test and 
guide instruction. In addition there is no theory of reading which 
can be considered complete at this time. 

Diagnostic Instruments 

For classroom use, the best method is the combination of 
approaches'which is easiest and provides the most usable infor- 
.mation. In classroom or clinical use, many distinctions between 
approaches disappear Nonetheless, the following section is 
designed to present sample diagnostic instruments based on 
stated goals and to point out how each approach may add to a 
teacher s repertoire. The first section will deal with sampling vo- 
cabulary knowledge from the skills, introspective, and models 
viewpoints. The second ^ection will show how these approaches* 
might assess reading comprehension. 

Sample vocabulary instruments . 
Situation: Ninth grade general English class 

Goal. Increase students' ability to communicate and understand by 

increasing general vocabulary knowledge 
Diagnostic choices: 

I. Skills approach 
# * •) A. For each of the words below, underline the root word and 
list three to five words which have the same root. 
* 1 vision 

2. bicycle 

3. perimeter 
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B, Define each underlined, word in the sentences below ac- x 
cordirig to the way it is used in the sentence., 

V 1, As' I walked home from the football game, I 

had a vision Of "what life icpuld be like in 2050, 
2, The perimeter of the^ffrmy camp was welT 
guarded, 

C, Select the b>St dictionary definition for the underlined 
word in eadh sentence, % * ' 

f, A^orclin^ to the paper, we can now bicycle in 
the*park, , , ' * 

\ a, n> a vehicle usually designed for one perspn 

consisting of a frame*, two wheels, a seat, 
4 ' handlebars for steering, and two pedals pn 

a motor by which it is driven, , # 

b, intr, V, to'ride^ travel on a bicycle, 

c, adj, haying two cycles, „ % \^ 
II, Introspection • * 

A- What does it nfean *o you when I say r^e perimeter of 

the wheel is 63 centimeters," L 4 
B, Rate each.of the following words top 'scale jof 1 to 4, 
Let L mean IVe never heard of JiJ* * > t \ 

2. mean I've heard of it butf caVt define it, ** * 
* 3, rhean I cant define it if iyhearV & a sentence, 
* - 4, mean I knolv it/I can define it, an,(H can use it, 

i * K * % / (Dale, lSTO) 

III, Models approach, ~>J* 1 ^ * t 

A- List as many words as you can think of that are^assopi- 
ated in any way withal following words. , ^ 

Example: milk, cookies,' chocolate^ white, cowt 
dairy, farmer,* baby, cheese, sour, ice 
cream, butter . \ < 

B, List all the ways you can find in which the objects repre- 
sented by the following words are alike, . .* 

bicycle, car, log, trailer, sewjng machine, eye: 
glasses, dime* u * ^ 

" C, Play Dictionary Poker or Glossary Guesswork by trying 
to write a definition on a 3 x-5 cgrd for one of the following 
words which you do NOT know. Your definition will be 
mixed with other students' definitions. The real definition 
will also be added. Each student will then have* chance 
to vote on which definition he/she thinks is "real." One 
point is awarded for each student who guesses your defin; 
ition (i,e, each person you fool). Two points are awarded to 
-» f r ^each person who correctly votes for the "real" definition. 

These first examples were based on diagnosing vocabulary 
knowledge as separate from* reading comprehension. Though the 

■ •• r , . .. 
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examples relate to general vodabulary, the formats can be directly 
adapted to any content area. Instead of "bicycle, " the same tasks 
could be carried out .with prvtractor, proton, or proletariate. 

One factor should be apparent about informal' vocabul 
•measures of the introspective* and models types: there^a 
• jingle "correct" answers. Tne>perabnal meaning, the ratin 
association task, finding* similarities, and Dictionary Poker all re- 
quire a student to actively search: and brganize thoughts in relation 
to a vocabulary* term. If thee activities are inconsistent with 
teacher goals, then the instrument's should not be used. 

It should also be apparent that none of the diagnostic vo- 
cabulary activitieMeal with pronunciation or what is called 
word recognition'." Pronunciation e ssum*s secondary -impor- 
tance to meaning recognition in conten^ .reading. Typically at 
stage 1 of Dale's rating scale (never heard* of it) we cannot figure 
out the pronunciation of a word*4thou£ assistance from an.out- 
ade authority, be it-teacher or pronunciation- guide, Pronuncia- 
tion of a term usually indicates that a student is a, least pt stage 
2-having heard of the word. It seems apparent that^oing from a . 
state of kpowjng what it means and how to use it requirt* time* 
and effort. As with any topic learning, an individual with more 
background information about a topic will Have an easier time • 
achieving mastery (Pearson, Hansen. &' Gordon, 1979), 

■> 'The ne*t section deals with assessing reading comprehen- 
sion from three viewpoints. Since goals should determine diag- 
nostic instruments, two different' situations are given. 

Sample coniprehonsion instruments ' ■ \ 

Situation: Seventh grade science class 

Goals- Jo increase student knowledge of types of animals! 

'„ To increase student interest in science. 

I. Skills approach • ' 1 

A. Preview oryaurvey reading 

" '2S aZ. m[nUteS 10 took throu ^ h Ch W ^ <pp'. 
s i W-lZb). Then answer the following questions: 

1. How mapy major types of animals are-describ- 
# ed in this-chapter? ^» 
• ' > \ What are the names, of "the major types of 
animals'' - , 

■ 3: What poiifts da,scientist& use to put animals' 
, , tote - " - " 



Into dif(oror » t fyaooca? Fox 



,. , v < - '"stance; what 
' pomts a<#ci6npts look at to put a snake into 
a diffeiieif .group from' a; -dog? 

vs « > \ ' 7 * " 
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. 4. Write down the numbers of each study .ques- 
tion on page* 107 that can be answered by 
% * , % reading just the subheadings. 
B. Relating ideas , 

Read the section entitled Marsupials (pp. 109-1 13). Then 
use the words in the word box to complete the outline. 
* - Marsupials 



Word Box 



Attach to nipple. 
' Where they five. 
.Opossum will eat 
* almost any food 
, Crawl to pouch. 
What the 




1. Young are raised in a pouch in t)ie mother's 
body. . .* + 
a* Born aliy^je * * 

i. 

3 .*<-■. 

b. t 11 Almost all live in Austmlraa^area. 
- 21 Opossums are only marsupial in tne U.S. 

c. lj Some marsupials eat only insects. / 
Kangaroos eat only plants. * 

C. MalnlcT 

Reread t\\e second paragraph on pa*ge 175 Which*begins 
with 'the words, "A^marfcupial is " Decide which 
sentence oelow best states themain idea, Circle the 
number before tne sentence you choose. *• * ^^^j^ 
1. Opossums live in the United States. 
' 2. Almost all marsupial^ live in or near Aus- tt 

■ tralia. » * • 
* 3'. The Tasmanian Devil t ts a marsupial. 

4. kangaroos are the biggest marsupial, » * 

5. Marsupials liv.e in a pouch after they are 
born. 

D. Understanding graphics , 9 

. Look at the diagram on page 121. Answer tne following 
questions: - . 

1. This diagram is about j . i^. 

~ — « zrf — g — A pachyderm in an animal, 

4 3. T F A pachyderm is a mammal. » 
4. T F A zebra is a pachyderm. 
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E. Recalling details 

FVom^piemory, list or pick from a lfst the names of three 
marsupials. . 



Name the only marsupial that lives in the United States., 

Name the largest marsupial. * 

Give at least two reasons wly the number of marsupials 
in. the world has decreased. 

Use and apply information you have learned about the 
ways marsupials a) are* born, *b) are cared for when 
young, c)eat, d) move from place to pla$e, e) reproduce, 
f) adapt to their environment, and, g) defend themselves.. 

Lust at least fne ways a kangaroo is like a dog > 
They both . 



G List at least thr& ways a kangaroo is like a person. 
' M / x The^ both 11, * 



List at least -three ways a kangaroo is different from a 
. dog. ' ■ " . 
A dog 12 but a kangaroo 



I. 



If^you found babies in the pouch of an opossum that had 
1 beevi klHed t*y a car, what do you think the SPCA or the 
r^atucal History Museum would suggest you do to care 
..- 'for them? Specifically, what would they eat? What 
should you provide hi 'their cage** , 
II. Introspective > » .' 

I Implicit in the introspective questions is the fact that the 
students have been asked to Survey the chapter and to make 
or cqmrjlete an outline. This is hot meant to imply that in- 
trospective instruments are' tied to the previous skill in- 
strumefc^ but are, used me fcfe ly to providecontimiity. 
1. List the things you loolfRTat when you had five minutes 

to look over the whole chapter. 
2 Briefly explain*how' you completed the outline Inote: this 
may be done orally in which case the teacher will keep 
N brief notes). 

3. What did* you do as you were-reading to help yourself 
remember? JSee note^above.) , - 4 

4 Lookback over the feetiorTorl mflrgnpml^'Tf fWo wm > 
any parts which you aid not understand, write the page* 
number, paragraph, and firsfcjjj^ words. 
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5. What did you do to try to make sense of the points you 
l vX didn't understand? 

Situation: Tenth grade social studies— Land Use in 
America in the 20th Century * A t 

Goals. Develop, understanding of interrelates between 
people^and nature. 

Discover the range of* variables to be considered in 
planning changes' required by increasing popula- ^ 
tfon. * ' 1 

Formulate a general plan fo£ making decisions 
about issues of land use, • , x 
LI I. 'Models , * , • \ . 

1. Cloze procedure ^ x , \> 
Simons lists the cloze procedure as a diagnostic test based 

' on the theoretical principle qf closure— the tendency for 
humans to complete wha^lhey see as incomplete. The 
cloze procedure consists of a portion of a text— generally 
* about 300 words— in which every fifth word has been 

. deleted and replaced with a blank. The student's task is 
to'compiete the passage. * - 

* Since cloze is based on student response to a text, it is* 
also an appropriate measure of readability. Since both 
the procedure' and * related research are described b,y 
Pikulski >ancl Tobin within this monograph no further 
prescription will be presented here? 

2. Hypothesizing \ 

, Before reading or discussing the topic: 

• a. Pretend you are writing a chapter for a»boo.k,en- 
titled* "Nature's Limits on Land Use/' Write 
down three or$nore subtitles that you would in- * 
elude in the chapter. * . o 

4 * t b. Pick one of your subtitles and briefly explain why 
* you»would include it in- th^chapter. *Y v , . 

3. Background knpwledge * ) 

a. What Jand Use decisions do you know of wtych 

'have been in- the news in the, past threfe years? 
b> Circle the number before any of the following 
- t ' issues %vhteh you ha ve^heard .or* read about. 

1. JL'ove Canal) waste disposal dispute . % 

2. Tnamer nuclear potver plant protests" 

3. (narpe) shopping center dispute ' / 

4. (name) highway construction dispute ' 

' 5. (name) housing project dispute - s • 

' ^ & v 4name),water ^ rights dispOte . • 

Z 1 Strategy formation * t j 

a. Pretend that you 4 are'servih| on a zoning bojard. 
♦. $ You meet once a month. At every meeting you are 
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asked to make decisions about how to use land. 
Usually some people want a change and other peo- 
ple want no change. List the criteria (points you 
would consider) which you would use to make the 
decisions. 

b. - Describe the steps you follow if you run across a 
point in your reading which you> do not under- 
stand. 

( 5. ReQuest procedure 

The ReQuest procedure (Manzo, 1973} is- a v task' where 
students read a sentence or two and then exhange ques- 
ts 9 uons with the teacher. That is, each stsudent has a chaiice 
to ask any question he/she chooses, then the teacher has 
a chance to ask questions. After several reciprocal se- 
1 quences, students are asked to guess at the remaining 
content-and read to check those guesses. Tape cecqr «*jgs , 
of small group (5 to >8) ReQuest sessions Avill provide 
material to analyze reading ability and reacting 'strategy * 
in appropriate content ^material* 

* As in the' case of vocabulary teaching, it is clear that 
students have a wider range of acceptable responses to the in- 
. trospective and models questions. Therefore, diagnosis by ^in- 
trospective or models perspectives, takes^nore time to evaluate. 
As any teacher, knows, time is a rptfst precious commodity. 
However, it is also possifcde to gather informat/on about student 
progess as 'related to different goals th^ougb/each -of the three 
viewpoints. Therefore, the most efficient (in terms of time ahd in- 
formation) means of gathering information about what students 
can do in relation to a teacher's goals depends not on the instru- 
ment but on the teacher goals. 

Conclusions 

In jumrriary, informal evaluation of content reading is the 
act of discovering what a student can do in relation to a content 
^ teacher's instructional goals. Diagnosis is an frit^mediate step • 
[ which ■logically falls between establishing clear goals andoi 
fanning of classroom operations. Informal diagnosis of content 
reading ability^maV be viewed irom the perspectives of read-*" 
ability, 'skills, introspection, or models of reading; but the in- 
struments ielette<jjj^iould provide information about each stu- . 
dcnf in relation lo^ he teacher's goals. With the approqgfree 
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described* above, diagnosis is not a one-time! test, but may be car- 
ried out by careful evaluation of ongpinrfcfa^oom activities. As 
long as "teachers, have clear goals an4 a meas^e.of what a student 
can do, they are prepared to plan etffective^mstnictiorr. 

• • y : • 4 - - 
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Informal ApproacfiiPto 
' "Evaluating Children's Writing- 

Ron0d L. Cramer 

Oakland University • « •* 

'All of the other papers included in this volume address 
themselves tp the evaluation of reading skills. One might ask 
why a paper is included which focuses upon the evaluation of 
writing. The basic reason is that writing and reading skills are 
highly plated (Shanahan, 1980) and a substantial amount of pro- 
fessional opinion (Bush & Huebner, 1970; Combs, 1977; Diirkin 
l'976)*as well as experimental evidence (Hunt, -1965; StotskyT^ 
1975; Zeman, 1969) suggests that theV^ntal and language pro- ' 
v cesses involved in written production of materials are the same or 
very similar to those involved m comprehending written mate- 
rials. Th)as, children's written compositionstanay mirror some of 
) the skills or weaknesses that exist in reading comprehension and, 
7 therefore,*- pffe/ one more avenue for making diagnostic 
judgments. This is not for a moment to say that the evaluation of 
writing* is not ,a v^Aied activity in itself. However, given the 
thrust and purpose of this volume it seems import^at to expli- 
• citly point out the ^ell-documented relationship that eSstst be- 
tween reading and waiting. /• r 

The Jocus s£ the evaluation procedures discussed in J this 
. paper is to u provide information about the teaching and learning 
processes' implicit in writing." Three different and valuable ap- 
proaches to the evaluation of writing skills will be discussed: 
teacher-evaluation, self-evaluation, and peer-evaluation. 
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Teacher-Evaluation: Holistic and Analytic Approaches 

Holistic evaluation is a methodof assessingiriting to gain 
, a global impression of its quality. In holistic evaluation each 
piece of writing is evaluated within two or three piinutes. The 
evaluation is guided by criteria which specify wha£ writing skills 
to consider; criteria are also developed which describe low, mid- 
dle, and high levels of achievement in several skill areas related to , 
writing. Such skill areas might include the quality of organiza- 
tion, the structure of sentences, and l^hejjse of correct, punctua; » 
tion. The purpose of holistic evaluation is to assess writing as a 
whole rather than to *con$ider every detail. Consequently, writing 
^deficiencies or strengths are not counted or quantitatively v 
^analyz^ld; general judgments are mads and achievement is ranked 
on a holistic scale. For example/ holistic evaluation of punctua- 
tion skills would not require counting the exact number of errors. • 
Rather a decision would be made jj| to whether the punctuation 
merits a low, middle, or^high ranking on a holistic scale designed 
to make holistic judgments possible. A holistic judgment r^gafd- 
ing punctuation might be: I) tfcere are many punctuation errors— 
rank this piece low t>n punctuation; 2) there are a few punctuation 
errors^rank this piece of writing in the middle on punctuation; oz^ 
3) there are hardly any punctuation errors— rank this piece of 
Hjiting high on purictuation. Rather than counting errors in 
punctuation, general guidelines such as the following are used to 
'arrive at' the ranking: 

High Consistently ends sentences with appropriate punctuation. 
Has strong control of internal punctuation and other less common 
punctuation. May experiment with, punctuation marks not yet 
fully~mastered. > ( 

& Middle, Usually etids sentences with appropriate punctuation. 
, Attempts to use internal punctuation, but makes some errors. 
Does not Jiave control of the less common types of punctuation, 
but sometimes attempts tp use them. 
. Low Often fails to use ending friinctuatioh correctly. Seldom 
oa'es internal punctuation. Less common punctuation is almost 
. never used correctly. The final judgment is quickly recofded on a 
checklist of writing skills for holistic evaluation* ' \ * , . 

There are two dimensions along which written Materials 
trfft be evaluated wit hin a holistiCH&valuation framework: compos - 1 



ERIC 



iluating Children's Vyriting ' v 81 



4. 



. ing skills and mechanical skills. Because of the basic differenced 
in style, content, and purpose of expository $s compared tanarra- 
tive materials, different aspects of composition skills need to be 
employed, depending on the nature* of the -Writing. The charts 
shown are reproduced with permission % from Scott, Foresman, 

* 1981 , gnd represent a summary of dimensions that can be used to 
evaluate narrative'and expository writing holistically. 

To facilitate the holistic evaluation of writing skills, it is 
recomir^nded that teachers make copies of the charts shown, ex- 

* eluding the descriptions. Separate sheets could be prepared for 
expository materials or narrative materials. An abbreviated sam- 
ple of an evaluation sheet is shown. 



Evaluation Form for Narrative Writing 

^ (1) ' (2) ; 

Low Middle ' 



(3) 
Hi^h 




STORY 
STRUCTURE 



It should be clear that children whose writing reflects good 
quality of organization of ideas will be likely to make use of the 
organization inherent 'in written materials that f ttiey read as an 
aid to comprehension. The child who uses good punctuation is 
very likely to be able to correctly interpret punctuation as an aid 
to reading comprehension. Essentially all 6f the qualities 
reflected in the standards listed have implications for better 
understanding the language and thinking processes common to 
reading and writing. * ^ 

In contrast tbiiolistic evaluation, an^lytic^ evaluation is a 
detailed counting and commenting on writing; and, unlike 
holistic evaluation, it iAiot dependent on general irnpressioris, 
"but on detailed analysis of each strength or weak^sS found in a 
piece of writing, The standards cited above can'serve #s the basis 
for Such commenting so a dimension likt will not be repeated,here. 
In analytic evaluation, for example, punctuation, grammatical,., 
and nsagp errors bW> correct ed* pfchAr writinprSprnhlpt^q fl y n"t*H 



and commeqted*bn in as much detail as seems necessary. 
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Undoubtedly, both analytic and holistic evaluation have 
utility. However, analyti^ evaluation is too time consuming to 
perform on every -set of papers. Indeed, analytic evaluation of 
each piece of writing is impossible when children write frequently 
or when class ^ize is large. The impossibility of analytic' evalua* 
tion is obvious when the logistics are considered. For example, if 
an English teache^with 150 students spends ten minutes on each 

' piece of writing, 25 hours would be required to evaluate one set of 
papers. How often can ,a teacher spend this amouftt of time 

. analytically evaluating one set of papers? One solution is to use 
more efficient holistic procedures. A teacher skilled m the use of 
holistic procedures can reliably evaluate 150 papers in six hours 
or less. Once teachers hav? learned to use holistic evaluation they 
can assess a greater volume of writing than analytic procedures 
alone permit. Clearly, l a balance between holistic and analytic 
evaluation is needed. *A balanced allotment of time would be to 
evaluate about -75 percent of writing % holistically and 25 percent 
analytically. This balance is expeqally appropriate in classrooms, 
where childen write frequently. Of cpurse, not every piece of* 
writing produced need be evaluated. There are legitimate writing 
assignments, such as certain t^s of journal writing, which re- 
quire no teacher evaluation. 

. ° 

Self-Evaluation: Guidelines *and Activities- 

Self-evaluation is the ability to improve one's own writing 
through self-directed editing and revision. It is the ability to look 
at a piece of writng holistically and goncjude that it needs general 
improvement. It is the ability^) look at apiece of writing analyti- 
cally ancTlocate the details that need correction or, refinement. 
S&lf-etfaluation is the ultimate writing skill.. Carpful, critical ex- 
amination of one's initial impression on conclusion derived from 
what one reads may be the hallmark of a critical reader. 
» to edit? o r revise means to improve writing*, ufi til it con- 

"forms to, an accepTabTesfandarcTof excellence. Standard^of writ- 
ing refer to the generally accepted ^ting conventions. These in- 
clude technical matters such as grammar, usage, and mechanics 
^ as well as the more substantive ufrting skills, such as organi- 

7flf inn ria racrranh of »ii/»f or>/^ w ^».Ji~S- A- t 1.11 

' dard of writing must be flexibly administered so that children's 
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current wrftihgaBility and previous writing experience are taken 
into account. ' . 

Editing ability grows ks children internalize the writing 
standards taught in the ^language program. When editing is 
taught in a variety of interesting and motivating ways, children 
develop the ability to examine theSr own writing critically. 
Writers should learn how to improve^their'own writing, even 
though some will never become outstanding self-editors. 1 t 

Teaching editing is- a challenging task. It is more difficult 
to teach children to evduatfe ' ttogir qwn writing than it is to 
evaluate for them. Teachers who have accepted the challenge 
have found that editirfg offers the best prospect for substantial 
writing growth. Of course, teacher evaluation must continue as 
part of the total evaluation program, but teachers must not 
waver from the^ultimat6 goal: Teach Children to be their own 
editors. # ' • " , ,» .Uitf - 

The following guidelines have been used by teachers who 
have succeeded in,heiping children to write freely and edit well. 

1. . Using vqrtous activities to stimulate self-editing Editing a,cti- 
n uties should plate children in various roles which require them 

to make judgments at?out*their own writing and that of others. 

2. Modeling editing behanor. The modeling of editing takes place 
when the teach er^Tri form ally comments on children's writing, 
during conferences with children about their writing, and when 
teachTpg editingln wnole~ctass=or group situations. It is essential 

' ' to be sensitive, appreciative, and accurate in dealing with the 
personal wgiting of children. 
3 Encouraging children to listen to their own writing before 
• editwg it. This .may bf done by working with a partner, by* 

* reading writing aloud, or by recprfling the writing and playing it 
back. Minor problems can be spotted immediately in this way, 
and, with experience, children will also learn to detect more 

• serious writing problems. • * M 

There are many activities for stimulating editing. A few 
that have worked w^ll ip. classrooms follow. 

1. Teaching editing regularly in editing workshops. The Editing 
workshop ii a structured procedure for teaching editing skills. 
The>procedure$ for teaching editing 1 workshops are presented in 
^detail under Jhe^discussion which' follows on peer-evaluation. 

2 — Hat ing vhiUlron ivrito quof,tioriL about (hb important ideas, in — 

their writing. A partner reads the account arid listens to the 
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question. The venter and the partner discuss any problems en- 
' countered^ The, discussion should lead to decisions about 
rewriting. 

3. Pairing a fiftKTor sixth grade class with a first or second grade 
• class The ok{er children act as editors and authors for the 

younger. After instructing the older children in the Eechniques of 
editing, have them help younger children with their writing. 
Young children often react more positively to an older child than 
to an adult. 

4. Instructing the children to underline certain words in theiitmost 
recent writing They might, for example, be told to underline 
words that might be changed for more exact, vivid, or lively cfe- 
scriptions; or they might be directed to use a thesaurus or dic- 
tionary to aid in j)recise^wprd selection., 

5 Placing editing charts in key places withirrg the room. Children 
nee^d help in learnirfg to use specific elements listed in the charts 
to check their papers. The charts should cover two basic areas: 

* " ' • \ ' 

Composing Skilhs Chart 
Did I say what I wanted to say clearly? 
Di4 I choose the exact wording^) others^ will understand? 
Did I arrange paragraph detatt^n logical or interesting ways? 
Is each sentence well formed? 

Does e # ach paragraph have a main ; dea and supporting details? 
Did I use more word%than necessary? 
Did my story have a clear beginning, middle, ajid ending? 
Did I make the people and events real and interesting? • 

Mechanical Skills Chart % 
Does each sentence end with the correct punctuation? 
Did 'I use punctuation in other appropriate places? 
Did I capitalize the first word of each sentence? . 
Did I capitalize other appropriate words? 
Did I spell words correctly and check words I was unsure of? 
Did I write in my best handwriting? 

These charts are general; more specific charts can be made to 
fit Certain situations. However, editing charts are useless unless 
chijdren bfcve been taught how qnd when to use them. 
^ / A major responsibility in teaching Writing is helping children 
to /learn the skills of editing. Successful teaching of editing, re- 
quires attention to detail, careful planning, and a general writing 
program that makes authorship an exciting enterprise without 
sacrificing discipline and* responsibility. 
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Peer-Evaluation: Purpose and Use 

When small groups of pupils work together to improve one 
another's writing, they are engaging in peer-evaluation.' Peer- 
evaluation is a group editingf experience intended to improve the 
writing of each individual child. It benefits the writing program 
in three ways: « * 

1. Peer-evaluation improves writing, as research by Lagana 
(1972) and others has shown. Research in peerevaluation 
shows that improvement has occurred in such areas as gram- 
matical usage, organization, sentence revision, theme writing, 
. and critical thinking interestingly, writing improvement 
brought about by peer-evaluation may be equal to or greater 
than improvement resulting solely from teacher evaluation. 
2 Peer-evaluation .helps pupils develop benchmarks against 
which to judge the quality of their own writing.- Peer-eval- 
uators are directed to look for 5 the presence or absence of 
specific writing features in the writing of their peers. As 
pupils e\ aluate the writing of their peers they develop greater 
awareness of what makes their own writing understandable to 
others Practice- in applying writing skills iri evaluation ses- 
sions helps pupils understand how these skills apply to their 
own writing and editing habits. 
♦ 3 Peer-evaluation broadens the audience for each child's 
writing, thus giving an additional incentive for writing. Since 
pupils relate best to their peers, it seems reasonable that some 
writing should be evaluated by this natural audience, 
broadening the audience for writing also stimulates children 
to select a wider r|mge of topics and may encourage more sin- 
cere and forthright language expression. 

Peer-evaluation has succeeded best where these three 
challenges have been squarely faced. First, pupils must be taught 
evaluate writfng sensitively and accurately. Second, pupils 
must be shown how to work together harmoniously in group set- 
tings. Third, teachers must be willing to tost pupils witniSk 
task of evaluating writing. When teachers.face these chalfer>ges 
arid are prepared to work hard to accomplish them* children's 
writing will improve. 

Three steps are recommended for implementing a peer- 
evaluation program: $ • 

L Teach the procedures for evaluating writing as a whole class 
— - — - activity pri oiHo-favmgpupHW wjork jndepeiidcnHy-jrrgroups^ 
Tell pupils they will be given a Writing Workshop for learning 

* ># 
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how to evaluate their own writing and the writing of others. 
Follow these procedures: * - r * \ * 

a. ^Give a writing assignment on the day preceding JLhe 

Writing Workshop. Have each pupil complete the writing 
assignment in first draft form. * 1 

b. Select* one pupil's* writing assignment lit is essential to 
secure permission and assure anonymity) and make a 
transparency. Project the draft materjal onto a screen. Tell 

* the pupils to read the draft, then ask, "What are some- 
things that have been done well in this draft?" List re- 
sponses on the board. Initially, pupils often single out the 

* ' mechanical technicalities of writing. • * 

c. Tell the pupils to read the draft again. Then\ask, "What are 

* some things that should be changed to improve this 
draft?" Make the suggested changes on the transparency 
with a grease pencil and list them on the board. 

d. Comment on each suggestion in a casual but informative 
manner. Comments should include information directly re- 
lated to good writing practices v as well as praise for 
thoughtful and accurate suggestions. No pupil's honest ef- 
fort should go without acknowledgment. 

,e. Assign .one or two items from the lists for pupils to 
evaluate in each other's drafts. For example, pupils, may be 
assigned to work in pairs to look for sentence fragments in 
each other's drafts. 
f. As the* pupils work, circulate among them offering instruc- 
tion and praise. For example, if a pupil cannot locate a 
sentence fragment, show the pupil where the problem *is 
and explain how to recognize it as a fragment. Pupils will 
often discover strengths and weaknesses in each other's • 
waiting that th$y were not assigned to find. This behavior 
should be praised and rewarded. Other children will imitate 
this responsible behavjgr and sojne pupils will soon be do- 
ing a more thoroygh job. Of course* official responsibility is* 
still limited to the specific ta$lf assigned for this particular 
writing workshop. 

/ . >. ' ^ 

2. After pupils have gaine'd evaluation experience 'through the 
writing workshop, they will be prepared for^Xhe more challeng- 
ing peer-evaluation ejTperiences described below: * ^ 
a. After pupils have completed a writing assignment, < 
organize them into groups of four to evaluate the 'para- _ 
graphs they have written. Have pupils use specific writing 
criteria, such as those shown below, to evaluate the para- 
graphs they have written. 
Do£ s the paragraph have a to pic se ntence that states the 
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Is the topic sentence at the beginning, middle, or the end- 

* of Ihe paragraph? 

Is each supporting detail related to the topic sentence? 

Is the punctuation and capitalization in each sentence 
correct? 

b. Have pupils make corrections and editorial comments on 
the paper each is evaluating. Explain tjiat comments 
should be relatedjto how well the evaluation criteria have 
Been met for this particular paragraph waiting assignment. 

c. Ha\e pupils rate the paper using a three point scale similar 
to the one given below. The rating system in the scale is as 
follows: 

0 Low * = 1 (The bottom 25 percent! 

• Middle = 2 (The middle 50 percent) 
. High ^ =3 (The top 25 percent) 

First, familiarize pupils with the purposes and functions of 
a rating scale such as the one described. Once pupils 
understand how such a scale works, they will have little dif- 
ficulty using it effectively. 
& Pupils exchange papers once again within their group. The 
second evaluator performs exactly the same functions de- 
scritjtfd in steps b and c above. The purpose here is to have 
tw<x different evaluations of each paper within the group. 

e. Return the papers to the original writers for a rewrite and 
preparation of the final draft. Encourage pupils to discjass 
tKe editorial comments and ratings they have given. 

f. After the final draft is prepared, reassemble the groups. 
•Have pupils read e e ach paper. Direct a discussion concern- 
ing the effectiveness of .their evaluation work. 

g. Have the pupils decide which of the four papers within 
their group best conforms to the criteria used to evaluate 
the work. * 

h. Collect, the final drafts and assign final grades if you so 
•"desire. Of course, it is not necessary to the peer-e valuation 

process that this be dtfne. 
i % Since, work that reaches the final draft stage often deserves 
a a wider audience than it normally receives, the instructor 
< may wish to have the class discuss Ways in which this may 
be accomplished, such as through a class^ newspaper, bulle- 
tin boards, or even a Young A§thor's Conference. 

\ s * 

The final .stage of peer-evaluation involves gkujps of pupils 
jointly producing and editing special project writing 
assignments. For example, pupils ;fiay jointly write and edit a 
play» research report, story, or other ^/ork. However, this 
should not be attempted until the groups are working to- 
gether harmoniously and effectively at the Step 2 level. 



9.7 



Peer-evaluation teaches children the basic skills of writing 
by having them edit and later produce written w&rk within a 
group setting. The genuine audience that peer-evaluation pro- 
vides is a powerful stimulus for learning. Implementing peer- 
evaluation requires considerable skill and dedication, but the 
rewards are often beyond the teacher's most optimistic expecta- 
tions. 

Conclusion and Suhimary 

Evaluation is often thought of as a way to assess levels of 
achievement in writing in order to assign grades. Indeed, evalua- 
tion has this legitimate function. However, this paper has con- 
sidered evaluation in a different light. Evaluation can also guide 
and inform* teaching and learning. When children learn to revise 
their own writing and that of others, they acquire evaluation and 
writing skills simultaneously. 

Teacher-evaluators can use holistic evaluation to gain 
quick impressions of writing. These impressions guide arid in- 
form group or individual writing instruction. Analytic evaluation 
achieves a similar purpose. However, analytic evaluatioh is more 
time-consuming than holistic evaluation. Thus, it is recom- ' 
mended that holistic evaluation be used more often than analytic. 

Self-evaluation is a means, of teaching children the skills 
usually exercised by the teacher. In the process of acquiring the 
evaluative skills required to revise their own writing,.children im- 
prove their writing ability. Teachers need not feel guilty about 
transfe^Mg a share of evaluation responsibility* to children. 
After all^revision is based on the premise that writers must learn 
how to evaluate their own writing if they are to become mature 
writers. v 

Peer-evaluation is an extension of self-equation. When 
children apply the skills they have gained in evaluating their own 
writing to the, writing of their peers, they are merely extending 
the arena of opportunity for learning how to write. 

Certainly the benefits of these informal evaluation pro- 
cedures for the teaching of writing skills make them worthwhile 
in and of themselves. However, as cited earlier," there is also 
strong evidence to suggest that evaluation and improvement of 
writing skills will also have a positive influence on reading skills. 

r 
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STORY SETTING 
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Standards for Evaluating Composing Skills for Narrative Writing 
tow Middle 



High 



No identifiable beginning middfe^or end 
Story problem ungear Action and char- 
acters not developed or related Essential 
detaHs hissing or confusing^ Story prob- 
lem not solved or resolution'unrelated to 
events . < 

s \ 


Beginning middle, and end presenf, but 
not always identifiable Story problem 
presented but not completely developed 
Some conversational or descriptive details 
included End may not show logical re- 
solution of problem 


fc Identifiable beginning, middle^ and end 
Characters introduced and problem pre- 
. sentea Characters and* problem well- 
developed with appropriate conversational 
or descriptive detail Storv ends with 

r* wwiwii vivi j gnu J VrlliJ 

believable resolution of problem 


Setting of the story.not identifiable Details * 
appropriate arib\confusing 


Time and place of story are hinted at Buj h 
uncertain Further? reje\wces to setting 
may be inconsistent with original time or 
place * 

— , , i 


Time and place of story clearly set 
' ( Specific detaite related to setting given in 
appropriate context Setting consistent 
throughout 


CfWacters not believable Details, related 
' to character development are inconsistent, 

inappropriate, or missing Difficult to 
j distinguish one character from another 
! Action of characters unrelated to problem 


Characters somewhat believable Some 
descripjive or conversational details 
given Details may not develop character 
personality Action of characters not 
.always related to problem- Major* and 
, minor characters not clearly discernable 


* Characters believable Descriptive or con- 
versational detail develops character per- 
sonality Action of characters relates to 

woblem Major characters more fully de- 
veloped than minor ones - * 


Conversation -among characters haphaz- i 
ard, incomplete, or muddled Much of the 
conversation^mappropnate to circum- 
stances and to 'personality of story 
characters Conversation seerrfs unrelated - 
to story being told 


'Conversation sometimes appropriate to 
circumstances and to characters Conver- 
sation may reveal character personality or 
relationships among characters Conver- 
sation sometimes not clearly related >jto 
story r "} 


■ Conversation appropriate to sto*ry circum- 
- stances and to personality of each 'char- « 

acter Conversation used to reveal char- 
• acter and develop interrelationships. 

among characters Conversation clearly 

relates t6 story C " 


Story idea -is tnfe or otherwise unin- 
teresting Story lacks plot or plot is vague ' 
Story ends abruptly or reaches no definite 
conclusion 


Story idea is interesting Idea may lack 
m freshness oc imaginativeness Story has a 

' plot Plot may npt be well-developed or en- 
tirely consistent Story e*nding v may not be 
satisfying onnteresting 

i_ i_ 


Story idea is fresh or imaginative Story 
plot is well-developed, is consistent, and 
. comes to' a satisfying, .surprising, or 
otherwise highly effective ending. 
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Standards for 
Low 



QUALITY OF 
IDEAS 



QUALITY OF 
ORGANIZATION 



SELECTION OF 
WORDS 



' STRUCTURE OF 
\ SENTENCES 

STRUCTURE OF 
PARAGRAPH (- 



Most ideas vague incoherent inaccurate 
underdeveloped or incomplete Details 
often unrelated to topic Nothing im- 
aginative or thoughtful about the ideas 



Introduction development, and conclu- 
sion unclear Emphasis of major and minor 
points indistinguishable Sentences and 
paragraphs seldom related by transitions 
Overall lack of coherence and forward 
movement y 



Word selection inexact, immature, and 
limited Figurative language seldom used 



No variety m sentence structure, often 
only simple sentences are used Transi- 
tions limited to such words as then; con- 
ductions to and Awkward and puzzmg 
sentences common Run-on sentences 
and fragments often appear 



Topic sentences seldom used irrelevan- 
ces common Order of details haphazard 
Little or no command of the tour common 
paragraph types 




Evaluating Composing Skills for Expository Writing 
Middle High 



Unevenness in completeness and develop- 
ment of ideas Most ideas related to the 
topic a few •unrelated Sound, but un- 
magmative ideas 



•" \ 



ideas relevant to the topic, fully 
developed, rich in thought and imagina- 
tion ar\d clearly presented 8 % 



Introduction., development or conclusion ' 
not easily identified Emphastfon major or 
minor points sometimes not well- 
balanced Transitions between sentences 
and. paragraphs used, but without con- 
sistency Forward movement variable 



Word selection usually suitable and ac- 
curate Over-used words and cliches 
somewhat common Figurative language 
may lack freshness, when used 



Some variety in sentence length and 
. structure Transitions used when neces- 
sary Few sentence constructions awk- 
ward and puzzling^ Run-on sentences and 
sentence fragments appear, but-do not 
predominate *" % * 
v 

Topic sentences^Jsually stated Irrelevan- 
ces uncommon prder of details usually 
suitably Limited ability to use the fou{ 
common types of paragraphs 



Introduction, development, and conclu- 
sion well-structured, complete, and easily 
identified Emphasis of major and minor 
points well-balanced Sentences and 
paragraphs clearly relajed by transitions 
Logical forward movement. 



Facility and" flair in word selection Writer 
experiments with words in unusual and 
pleasing ways Figurative languagexused, 
often in interesting and imaginative ways - 

f Sentence length and structure varied 
Sentences consistently • well-formed 
Smoothjlow from sentence to sentence. 
Rurf-on sentences and sentence frag- 
ments rarely appear / 



Topic sentences stated -ami supported 
^with relevant details Appropriate variety 
used in ordering details (chronological, 
logical, spatial, climactic) Fguf types of 
paragraphs used when appropriate (nar-' 
rative, explanatory, descriptive, per- 
suasive). 



Standards for Evaluating, Mechanical Skills for Narrative or Expository Materials 



GRAMMAR AND 
USAGE 



PUNCTUATION 



CAPITAL- 
IZATION '.. 



SPELLING 



HANDWRITING/ 
NEATNESS. - 



Frequent errors in the use of nouns, pro- 
nouns, modifiers, and verbs # 

. \ ■ 

< ? 


Granyffalical conventions of inflections, 
functions, modifiers, nouns, pronouns, 
and verbs usually observed Grammatical * 

ciiuio oui 1 ICil 1 1 ICS (jHLUUl 


Grammatical conventions of inflections, 
functions modifiers, nouns, ^pronouns, 
and verbs observed Grammatical errors 
infrequent 


End pflncluation often used incorrectly' 
Internal punctuation seldom used Un- 
commornpunctuation is almost never used 
correctly 


Sentences usually end with appropriate 
punctuation Internal punctuation^used, 
with occasional errors Uncommon punc- 
tuation sometimes used, but often inac- 
curately 


Sentences consistently end' wtth ap- 
propriate punctuation Internal punctua- 
tion and other less common punctuation 
usually correctly used - 


s * 

Rest Word of sentence often not capital- 
ized Pronoun/often a small letter Proper , 
» nouns seldom capitalized Other capital- 
ization rules usually ignored 


First word of sentences nearly always 
capitalized* f always capitalized Well- 
known proper nouns usuatty capitalized 
Other capitalization rules used, but not 
consistently 


First word of a sentence and the pro- 
noun/always capitalized Well-known pro- 
" per nouns nearly always capftahzed Good 
command of ottter capitalization rules re- 
ga^ng titles, languages, religions, an£ 
so on 


Frequent spelling errors Shows a frustra- 
tion spelling level (less than 70%) Unable 
to 'improve spelling accuracy in edited 
work without help Misspellings often dif- 
ficult to recognize as English words 


Majority of words spelled correctly Shows 
an instructional spelling level (70 to 
* 80%) . Approaches Q0%> accuracy #t 
edited work Misspellings approximate 
correct spellings , • " 


Nearly all words spelled correctly Shows 
an independent spelling level (90%)^p- 
proaches 100% accuracy m edited work . 
Misspellings close to Wrect spellings. 

* * 


Handwriting difficult 'or 'impossible to 
read letters and words crowded Forma- 
tion of letters inconsistent Writing often * 
illegible 


Handwriting usually readable, but some 
words and letters difficult to recognize 
Some crowding of letters and words. 

* 

-> 


-Handwriting clear, neat, and consistent 
Forjns all letters legibly with consistent 
spacing belween letters and' words J 




The mechanical skills for writing are essentially the same for expository and narrative materials; 
therefore, only one chart is needeel to describe the standards" for evaluating either form of written work. 



Informal Reading- Inventories: 
A Critical Analysis 



John J. Pikulski 
University of Delaware 
and 

Timothy Shanahan 

University of Illinois at Chicago Circle 

Given the apparent widespread popularity of the informal 
reading inventory for reading evaluation, it seems appropriate to 
periodically critically evaluate the^ status of this major approach. 
In a 1974 publication, Pikulski attempted to comprehensively 
evaluate the available information about informal reading inven- 
tories and to make suggestions as lo the directions that future 
research and inquiry might take. This paper is an attempt to look 
at the ainount of progress that has beenTriade toward answering 
some of the questions raised in that 1974 review, and to consider 
some new issues whidi have arisen. , 

There wilibe a focus on several research studies which help 
to answer questions about the reliability and validity of the pro- 
cedures ^lsed for informal reading evaluation. Issues of interrater 
and, alternate form reliability, criteria* for establishing reading 
levels,, differences between miscue analysis and informal reading 
inventory procedures* a*id the role of comprehension' analysis are 
considered. Onlystentative conclusions can be offered in many of 
these areas because of the limitations inherent in the available 
research. " « 

An informal reading inventory consists of a sequential 
series of reading selections, graded in difficulty, which students 
read, and answer questions about, and a set of procedures for 
analyzing the" student's reading behavior in an instructional 
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situation. The instrument used for this analysis can be a pub- 
lished inventory or it can be teacher constructed. Both forms of 
informal reading inventories will be discussed in this paper, and 
the reader is cautioned to keep the-distinction between the two in 
♦mind. 

% « 

Reliability 5 

No serious treatment of formal assessment devices, such as 
a standardized grQup achievement test, would dare to omit a 
discussion of reliability if its authors expected the test to be ac- 
cepted as a legitimate equation tool. However*, it appears that 
many textbooks and published inventories ignore the issue of 
reliability when'IRIs are the topic. This is unfortunate, as an 
asses^ent instrument certainly cannot be useful if the results it 
yields are Unstable and affected by. chance factors. Of course it 
could bearguecLthat informal measures do not require the same 
level of reliability expected of formal tests because of the 
possibility of multiple administrations and ongoing observation 
of student behavior after the initial testing.' For example, 1i 
teacher might employ an IRI to place a student, in a reading book 
with an appropriate level of difficulty. Every time the student 
receives instruction in that book there is an additional opportu- 
nity to evaluate the accuracy of -the Initial test Results. Although 
such cohtihued monitoring could go a long way toward overcom- 
ing limitations in ^reliability, empirical ,data suggests theft 
tSachers do not make such alterations of instructional place- 
ments frequently (Austin & MQirifcon, 196l; Rosenbaum, 1960; 
Weinstein, 1£76). , , / ' 

Even given ongoing evaluation, nothing is gained from the 
use of unreliable measures. The question of whether the results of 
informal reading inventories are consistent or reliable is still im- 
portant. Unfortunately, a search of the literature reveals little 
th&t is new in helpmg us to answer that question in an informed 
way. , i . 

Interrater reliability. One form of consistency asks, will dif- 
ferent examiners using the same instrument to measure the same 
thing get the same results? It's called interscorer or interrater 
reliability. r 
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A 1975 study by Page and Carlson suggests that the' re- 
- suits from informal evaluations may be far from consistent. They 
found that experienced reading specialists were hot able to agree 
very consistently on the quality of oral reading. In their study, 
seventeen certified reading $pecialists listened to a tape-recorded 
oral reading performance. The teachers were directed to mark all 
niiscues or errors, and to count them as they would in an informal 
reading inventory. They were to indicate whether the passage 
was at the student's independent, instructional, or frustration 
level Although these teachers listened to the same tape, six rated 
the passage to be at the independent level, five said it was in- > 
structional, and six said frustration level. 

Similarly, in a study by Allington (1978), teachers were 
found to be quite inaccurate in their analyses of a taped oral 
reading performance. No specific reliability data were reported-m 
this study, but a large percentage of the teachers' errorV&ppeared 
to be such that consistency is doubtful. The analyses of these 57 
teachers differed markedly (on -the average about 28 percent) 
from the number of errors actually, on the tape. 

However, studies 'by Lamberg (1975), Lamberg, Rodrigues, 
and Douglas (1978), and Roe and Aiken (1976), areW*entourag- 
ing. Working with preservice teachers, they found that fairly 
good accuracy could be achieved, even -oyer a relatively short 
jjeriod of time, if consistent, structured training techniques were 
used. Undergraduates were able to significantly ' decrease the 
number of errors they made in recording oral reading perfor- 
mance and were also able to improve in determining whether a 
deviation from the expected oral reading response was a reflect 
tion of the speech patterns of children from Spanish-speaking 
backgrounds. 

. The recency of the training appears tobe an important fac- 
tor in the consistency of evaluatifti which can be derived from an 
W There appears to be a need for frequent posttraining checks 
to insure consistency of evaluation. The fact that the only studies 
in which consistent reading evaluations are found are studies in 
which all teachers take part m the same training program also 
suggests the strong possibility that reading personnel are ex- 
posed to a wide variety of training procedures which influence 
how they score and interpret informal reading inventories. 

4 • \ 



Alternate form hli^ilit^. A seconcf form of reliability con- 
cerns whether one would get similar results even when two dif- 
ferent forms of the same test are used. This reliability question 
would seem directly answerable for the published informal read- 
ing inventories, especially since most of them have several forms 
of the test at various gr^de levels. The Classroom Reading Inven- 
tory, for example, Uas three parallel'sets of testing material 
(Forms A, B, and C) for each level, preprimer throuJ«^j|t 
^rade. Would one obtain the same results with form C or Bason" 
would with A? Although the Class room Reading Inventory- is* 
now in its third edition, the question of reliability is not ad- 
dressed anywhere in the test materials. In general, it appear? 
now, as ittdid in 1974, that some authors of published informal 
reading inventories do not feel a need to provide traditional 
psychometric evidence for the reliability or validity of these in- 
struments. 

' The fairly recent Elcwall Reading Inventory i§ the only one 
of the published inventories available to us which directly ad- 
dresses the subject of reliability. , Ekwall (1979) reports' a 
4 'preliminary study M involving 40 subjects. The study seems a 
study of alternate form reliability since Ekwall reporta^that the 
correlation between Forms A and C, which were used to measure 
oral reading performance, was .82 and the correlation between 
forms C and D was .79. However, Ekwall labels it a study of in- 
terscorer reliability because one examines gave the tests in 
grades one through four ^nd anothfer gave thTtests in grades five 
through nine. It still appears to us that it's a^tudy of alternate 
form reliability. In ^ny event the remits a^Ufficult, at best, to 
interpret since Ekwall doesn't even rqief^hat it was that was 
correlated. In addition, a reliability coefficient of only .79 is not 
particularly impressivfe since a frequently accepted guideline for 
-an acceptable reliability coefficient for a test that is to be used for 
individual diagnosis is .90. V 

Several studies done in the past few yebirs have also raised 
questions al*>ut the potential reliability lor instructional 
material mis. In several srtudies, Bradley and aWs (1976, 1977) 
as well as Eberwein (1979), have presented evidence to suggest 
that oasal readers vary considerably in readability. If a book 
designated as being at fourth reader level contains selections that 

\ 

- \ 



> ■ in' 1 

^^ntical Analysis u ^ 97 



ERIC 



range in readability from fir3t to eighth grade level/then a child's 
reading performance at the fourth reader of a particular basal 
series may vary considerably depending on t£ie passage the 
teacher or publisher selects to use in the informal reading inven- 
tory. This poses a serious threat to the whole concept of informal 
reading evaluation which suggests that the best way to tell 
whether children can successfully receive instruction^ a given 
book is to ask them to read*a sniall sample of that book. In their' 
1978 study, Bradley and Ames, after analyzing hundreds of pass- 
ages selected 'from popular basal reader series, found that pass- 
ages taken from a single level of a basal reader might vary in 
1 readability from first ttf twelfth grade level. In 'an earlier stud}, 
Bradley and Ames U976) illustrated the effect that passage 
variability withih'the same basal reader book could have on oral 
reading performance. Students were four^d to be at a^ariety of „ 
levels of proficiency, although all'of the JRI passages frad'been 
selected from a single basal reader. Z. 1 

In terms of variability of readability, at least some of the 
published inventories do present an advantage, Johns, for exam- . 
pie, reports readabilit/^'stimates for all of the passages used in** 
the Basic Reading Inventory using both the Fry and either the' 
Spaqhe or Dale-Chall formulas. He presents the results^for~ all 
levels and all forms of the inventory, and the results indicate that 
/ the,readability is at or close to the designated levels of difficulty. 
Ekwall reports using the Harris-Jacobson formula to adjust the 
readability level of each passage to the midpoint of its designated 
level. 

While the published inventories do seem to present infor- 
mation suggesting that passages are at their designated level of 
readability, comments such as -that by Ekwall raise suspicions. 
Reviews of readability research (Klare, 197 1975) suggest that 
readability formulae are reasonably good indices of difficulty of ' 
material and they w^rn that readability formulae were never in- 
tended as guides .to the writing of materials. Simpjy shortening 
sentences and thus adjusting the readability designation of a . 
passage, may have little or no effect on the actual level of dif- 
ficulty^ the passage (Hansell, 1976). 
* , Error analysis. A frequent claim for informal evaluation is 
that it can yield* valuable information about strengths and 
weaknesses that a person has in reading. It'is not uncommon to 



/ M ' ^ . 

98 • _ " ' Ptkulski and Shanahan 



i 



find thpt as a result vi an infprmal reading inventory, a diag- 
nostician concludes there are specific skills in word recognition or ' 
comprehension that a reader possesses and others that he or she 
lacks. Is there evidence that such analyses can be made reliably? 
No evidence concerning the reliability of such evaluations was 
found. Spache (1976, p. 141), criticized both commercial and 
teacher-made mis for the failure to "recognize that the number of 
errors analyzed should be ,75400 for a reliable diagnosis , 
Repeated testing to ob t tain such a sample may be required to be 
certain that the "remedial plan is formulated on a sound basis." 
This statement seems to east doubt on the possibility of a reliable 
, " error analysis under typical circumstance, but Spache reaches 
, b this conclusion on the basis of studies of spelling accuracy^and . 
not studies of reading diagnosis (Spache, 1980). Spache's state- 
. ment raises the need for cautioiMn the analysis ofieading errors 
or miscues* Future research should consider whether it is' possible 
to derive a Reliable assessment of specific skills through the use of 
traditional IRJ methodology. ' * 

Validity and Criteria for Establishing Reading Levels 

. The questionW v^lidity-that is, does a tes,t measure what 
it purports to measita— is difficult to address for any reading 
test, but again it is a central, critical concept for any assessment 
technique. 

One validity issue surrounding reading inventories relates 
to the criteria recommended for establishing reading levels. 

More than a decade ago, William 'Powell (1970) seriously 
challenged the traditional criteria for setting reading levels from 
informal reading inventories. The traditional criteria are usually 
attributed to Emmett Bettsi'Powell suggested th^t word* recogni- 
tion criteria be adjusted depending on the graHe level of the child 
being evaluated informally. #t first gfade, for example, his ' 
research suggested that only 83 percent oral reading accuracy be 
required in order to establish fji instructional l^vel. The word 
recognition, accuracy recommended for an instructional level rose 7 
„ . successively at pach" grade" level through sixth grade where 94 . - 
percent accuracy was required. 

Unfortunately, little research hasbeen done in an effort to 
determine the apprbpriate x criteria for the .establishment of levels 
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since the review of informal assessment reported in 1974. Ekwall, 
Solis, and Solis (1973) reported a study of third, fourth, and fifth 
grade students who were, given an informal reading inventory 
while they were monitored by a polygraph. Since polygraphs .(of- 
ten called lie detectors) are designed id measure anxLity, it was 
felt that through the use of the polygraph record the experi- 
ment^ Jliokild discern the maximum amount of word recogni- 
tion and Comprehension errors a child could tolerate before stress 
and" anxiety became apparent. Ekwall, Solis, and Solis failed to 
find any significant differences in reading scores associated with 
stress indicators that* deemed related to the grade level of the 
child being tested, as would be predicted from" Powell's position. 
Their data also suggested that Che 90 percent word recognition 
criterion for a frustration level was associated with indicators of 
Stress on the polygraph readings; this again challenges Powell's 
s)|$^estion that 91 percent word recognition accuracy is adequate 
for an instructional level at third grade. This study, which is also 
reported in the Reading Teacher (1974) by Ekwall, and the Jour- 
nal of Learning Disabilities (1976) by Davis and Ekwall, found 
that .the amount of word^ recognition and comprehension errors 
that a deader can tolerate may also depend on level of intelligence, 
on whether, the child is an achieving reader, and on some per-, 
sonality characteristics. * m 

Since the available research s.eems limited, one might ques- 
tion professional opinion about IRI criteria for setting levels/in 
1971, Powell and Dunkeld commented on the almost astonishing 
agreement of reading experts in accepting Betts' criteria in spite 
of the lack of Experimental evidence to support those criteria* 
They found that among eleven authorities in the field only two 
proposed seriously different criteria, and one of these was # Powell 
himself. ,We thought it might be interesting to see if the situa- 
tion had changed over the, nearly ten years since Powell and 
Dunkeld's report. We, therefore, selected from the shelves in our 
offices the first eleven reading texts we came acrds^ that had . 
publication dates of 1978 or later and Which discussed criteria for 
informal reading inventories. We were as surprised as Powell 
and Dunkeld had been with the agreement among reading profes- 
sionals about .the criteria to employ. Again, the vast majority of 
opinion suggests acceptance of Betts' criteria. Bond, Tinker, and 
Wasson (1979), Dallman, Rouch, Chang, and.DeBoer (1978), 
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Durkin(l978), Farr and Roser (1979), Hall, Ribovich, and Ramig 

(1979) , Ransom (1978), Roe, Stoodt, and Burns (1978), and Stauf- 
y fer, Abrams, and Pikulski (1978), all recommend setting reading 
\ levels on the basis of t^e traditional criteria. Cheek and Cheek 

(1980) , also accept Betts criteria as an equally good alternative. 
They coition only that the diagnostician adopt one or the other 
set of criteria, indicating-* that evidence doesn't lead to a 
clear endorsement of either set. This was the only one of the 
eleven texts that did suggest that Powell's criteria were accept- 
able. Bader (1980) basically recommends theTfetts criteria, with 
minor changes in the^criteria for comprehension performance, but 
only in cases where silent reading precedes or^l reading; other- 
wise, she recommends lower standards. Harris and Sipay<1978) 
suggest yet another set of criteria based on a 1952 study by 

\ Cooper, which compared scores on an mi with the amount of 
reading test growth made over a year. Based on this study, Har- 
ris and Sipay recommend that the most suitable word recogni- 
tion score for an instructional levef in grades two and three is 99 
1 1 percent,' and word recognition scores of 97 to 99 percent for in- 
termediate grades. The comprehension criteria recommended for 
an instructional level are 70 percent and up for second and tmrd 
grades, and 60 percent and up for the intermediate grades. In- 
terestingly, these criteria are more stringent at lower grades and 
less so at higher grades in direct contradiction to Powell's conten- 
tion that children ain toJerate the greatest degrte of error at the 
lowest grades. It siftmid be noted that Cooper's 1952 dissertation 
and Dunkeld's 1970 study are unusual in that they base their " 
recommendations as to level Setting criteria on the eventual pro- 
gress students made in reading. Additional studies of this nature 
are needed. * 

Another place to look at professional* opinion regarding 
4|k criteria for informally establishing reading, levels is in *the 
* published iris. Here, again, agreement is astonishingly con- 
sistent, and is most accepting of the traditional iri criteria. The 
Classroom Reading Inventory, the Basic Reading Inventory, the 
Content Inventories, and the Diagnostic Reading Inventory all 
accept the traditional criteria; the Ekwall Reading Inventory also 
adopts the traditional criteria except that 60 percent or more 
comprehension is acceptable as an instructional level rather than 
the traditional 75 percent or more score. The Sucher-Allred 
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£ha£ departs somewhat from traditional criteria. For an instruc- 

• ^tional level, the criteria are 92 to 96 percent accuracy for word 
recognition and are 60 to 7§ percent for cpmgcehension. Scores 
below an instructional level are- at a frustration levfel; scores 
above jthe instructional level'are acceptable foiTan independent 
level. ' . * 

The agreement with respect to criteria among authors of 
published reading inventories is truly impressive— rarely do we 
see five put of six of our colleagues in essential agreement about 
anything. The. agreement i^ even more impressive whei> we find 
that in addition to the strong agreement on the criteria for 
percentage .scores, there is also ,Widesprfcad agreement on what 
constitutes an error. All of the inventories, with the exception of 
' the Basic Reading Inventory, agree that omissions,, insertions, 
substitutions, mispronunciations, and repetitions constitute 
errors The Basic Reading Inventory departs seriously, from the 
other published inventories by not counting repetitions as errors 
and by encouraging the examiner to count* only "significant 
misCuea." After examiners count the total number of miscues, 
they are directed to count the number of dialect miscues, all cor- 
rected miscues, and all miscues that do ipti change meaning. 
These "insignificant" miscues are not to be used for level setting. 
Johns' recommendation reflects the fact that there is in in- 
escapable problem in weighing all errors equally. It was pointed, 
out in the 1974 review that it is unquestionable that there ?re 
gradations of gravity in the types of errors- mad'e. It does seem 
less serious when a child substitutes the' word "fruit" for "apple|£ 
as compared with«not being able to attempt to pronounce, the 
word. However, it would seem on tjie 1 surface that the procedure 
advocated by Johns would yield substantially higher scores than 

^ would the procedure advocated by any of the ofeher published in- 
ventories; yet Johns continues to advocate use of the. traditional 
criteria scores for establishing reading levels. We see this as a 
potential problem since the traditional criteria were meant to'ap- 
ply to the oral reading accuracy scores based on allerrors. Hoff- 
man (1980), in an article which cautions against weighing the 
errors in informal reading inventories according to miscue 
analysis procedures, came to a conclusion which seems to war- 
rant careful consideration. He writes: 'JThere is no question that 
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qualit§tive techniques of assessment such as miscue analysis are 
a far richer soiarce of information for the discerning teacher than 
simple error counts. Qualitative techniques are revealing of ways 
in which instruction might be adapted to .meet specific students' 
needs. It would appear advisable, however, that until such time 
as we are able to demonstrate how qualitative analysis can better 
meet demands for accurate placement of students in instructional 
materials thajv^fmple quantitative analysis, we should try to 
keep the two procedures as separate and distinct as the purposes 
for which they are used" (p. 138). la addition, we wonder if there 
might not be interscorer reliability problems stemming from dif- 
ferences in judgment as to what is and 'is not a " significant 
miscue/' ' 

It seems appropriate for all those who might consider 
changing basic I.RI procedures to conduct some research on the ef- 
fect that the changes might have in raising or lowering scores and. 
to then'consider establishing criteria. For example, Ekwall (1974) 
has^suggested that repetitions hot be counted as errors, and Gon- 
' zalez and Elijah (1975) indicate that passages should be read 
silently before oral reading analysis occurs. These changes are 
not unreasonable, but they serve to raise scores ancl would 
possibly lead to over placement in reading materials. When 
authorities recommend chaffiges in procedures, they need to also 
address whether criteria for level setting need to be revised. 



Informal Reading Inventories and Miscue Analysis 

. The Basic Reading Inventory is certainly not alone in 
recommending a kind of psycholinguistic interpretation of infor- 
mal reading inventory results. In fact, a frequent recommenda- 
tion with regard to informal evaluation during the past decade 
called for a wedding of informal methodology with miscue 
analysis, especially with respect to the interpretation of oral 
readirfg Derformance. A frequently heard criticism of informal 
reading inventories is that they stress the quantitative rather 
than the qualitative aspects of an oral reading performance. As 
Weaver and % Smith^|979, p. 103) put i^^The major problem . . , 
is that many versions of the mi encourag?Keachers to look prim- 
arily at the quantity of a reader's errors.rather than the quality. 
Such a procedure may lead teachers to underestimate children'^ 
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reading strengths and or to prescribe inapgrogrjate _ skills 



lessons," Though advocates of informal -evaluation usually urge 
that the nature and severity of the errors be taken into considera- 
tion, they are usually sketchy in their description of just fiow this 
qualitative analysis be undertaken. Miscue Analysis, with its 
identified categories* for analyzing oral^reading errors, seemed to 
many to be a natural accompaniment to informal evaluation. The 
use of traditional iri numerical criteria could allow for the setting 
of independent, instructional, and frustration levels while more 
careful diagnostic observations regarding reading strengths and 
weaknesses might take place through miscue^ analysis. As Wil- 
liamson and Young (1974) put it, "The power of the diagnosis 
jnade by using the Informal Reading Inventory 'and the Reading 
Miscue Inventory is increased if the concepts from both these 
# techniques are synthesized. The IRI is an informal procedure for 
^determining error count, four reading levels. .,. . The RMi focuses 
on the quality of a reader's^ errors." 

One immediate obstacle to the marriage, however, appears 
to be the alleged impracticality of miscue analysis for classroom 
use It is frequently estimated that administration of, a reading 
miscue inventor} following^the guidelines offered by Goodman 
and Burke (1976) takes wSl over an hour. In response T;o this 
critici^m t< articles such as those bf B4an (1979), Christie (1979), 
Siegel (19*79), and Tortelli (1976), proposed simplified procedures 
which were .designed to shorten the amount of time ne^ed to 
make a systematic qualitative interpretation of an oral reading 
performance. Christie suggested a two-step procedure wherein 
oral reading deviations from the text were first analyzed for their 
graphic* similarity to the original text, as to whether they were 
semafitically acceptable and self-corrected. The second step 
called for a summarization of this information in terms of tfre 
predominant strategies the reader employed. 

We were able to locate no information in the literature as to 
how widespread the systematic use of a miscue analysis of infor- 
mal reading inventory results has become. Our very informal ob-» 
servation* based on ^discussions with teachers and reading 
specialists, is that liser&of ihformal evaluation largely continue 
to rely predominantly,; on an "eyeballing" of the oral "heading 
notations and to bkse^fieip judgments on these relatively un- 
systematic analyses. - \ 
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There is another difficulty that has been unearthed that 
presents problems for merging informal and miscue analysis pro- v 
cedures, a problem that may, in fact, be a general one TofTnfecu^ 
analysis. This stems from the fact that there is a growing body of 
information which strongly suggests that the types of oral read- 
ing errors or miscues that are made are dependent on the level of 
.difficulty of the material that is being* read. As a reader moves 
from material that is only mildly or moderately challenging to 
material that is difficult, the type of oral reading errors or 
miscues that are made change. 

One very 'consistent finding is that as a reader goes from 
reading materials that are at an instructional levjel to materials 
that are at a frustration level, there is a change in the type of 
error made, with a strong tendency to maKe less use of meaning 
and context clues (Christenson, 1969; Kibby, 1979; Leslie & Osol, 
1974; and Williamson & Young, 1974). For example^ Kibby used 
RMI procedures for coding such^deviations from text to terms of 
its grammatical acceptability, ^semantic acceptability, and 
whether the miscue was corrected on the basis of the interrela- 
tionship of these dimensions. The reader was classified as having 
a strength, a partial strength, or a weakness in grammatical rela- 
tionships. Using a population of fourth, fifth, sixth, and seventh 
grade disabled readers; he found that 4 percent of the students 
demonstrated a strength in grammatical relationships when 
reading a passage from the^Spache Diagnostic Reading Scales 
that was judgeaUoo difficult to meet the standards -for being at 
^an instructional level, but a full 74 percent demonstrated this 
strength when reading a passage where instructional criteria 
were met. Similarly, Leslie and Osol (1974J found a significant 
^difference between the number of uncorrected errors that 
resulted in "a loss of meaning, depending on whether the eighth 
grade students, who were subjects in this study, were reading in- 
structional level material or material that was more difficult than" 
instructional level. When they were reading materials with 95 to 
99 percent accuracy, they were significantly more likely to cor- 
rect errors that produced a loss of meaning than &ien they read 
materials with 90 to 94^ percent accuracy. Similarljtndings wer6 
obtained by Williamson and Young (1974) using elementary 
grade subjects who were reading t at fifth grade level. They, had 
students- read 'from both basal readers and science materials. A 
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(study by Negin, reported by Pearson (1978), suggested approx- 
imately a 15 to 30 percent drop-off in the use of context at the 
frustration level from that appearing at the instructional level. 
* This study also indicates that students frequently are unable to 
read known words (i.e., words they could read in isolation) in the 
context of frustration level material. 

There dpes seem to be a fair amount of evidence to suggest 
that the pattern of errors that students demonstrate and the oral 
reading strategies that they employ will change with the level of 
challenge that those materials present to the reader, The more 
difficult the material, the less likely readers are to employ mean- 
ing and context clues, ^nd the less likely th6y are to correct errors 
that detract from the meaning of the passage being read. 

The implications of these findings seem twofold for infor- 
mal evaluation. 1) It would seem to be inappropriate to group 
.together'all miscues and errors. It seems necessary to analyze 
them according to whether they are or are not at a Subject's in- 
structional level (a procedure which indicates the need for both 
qualitative and quantitative analyses). 2) The practice of using 
difficult materials for oral reading elation seems ques- 
tionable, at best. Thus, the advice of Goodman and Burke (1976,.. 
p. 20) that materials to be used for constructing a reading miscue 
inventory be 'one grade level above that which is usually assign- 
ed in class" may be inappropriate. It seems likely that a difficult 
passage such as-suggested by Goodman and Burke will limit the* 
extent to which a reader can employ language and context clues 
and will force an overreliance on graphic clues. 

In addition to these implications, this fairly recent research 
on the changes that occur in the pattern of oral reading'errors or 
miscues also seems to provide some added support for the tradi- 
tional criteria since in several of the studies, readers began to 
become inefficient and began reading mechanically, rather than 
for meaning, as their performance dropped below 95 percent ac- 
curacy in word recognition. , 

Comprehension Analysis and the- 
Informal Reading Inventory 

The role of comprehension evaluation* in the iri was 
discussqd briefly in the previous review. The brevity of the treat- 
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ment was Que to a' dearth of available inquiry into informal -com- 
prehension assessment a't that time^We assumed th'at the explo- 
sion of comprehension research which has occurred over the past ' 
few years would have led to an increase in research, efforts con- 
cerning informal evaluation. Our search of -the literature failed to 
/ uncover much in this regard. We have chosen to discuss, if only 
' briefly, what^e have found- on this topic not because this 
material js sufficient to answer the important question concern- 
ing comprehension evaluation. But because it raises some issues 
which require further attention. N 

Both commercially published and teacher-constructed iris 
'usually employ five to. ten questions per passage to evaluate 
reading comprehension.The questions serve to dired students to 
read for meaning, and performance on the questions is actually 
used t6 assist in level setting. Sometimes specific error analysis, 
by question type, is recommended. These questions are often 
designed according to published guidelines (Johnson & Kress 
1965; Valmont; 1972). Recent research on the efficacy of such 
guidelines for designing appropriate questiops has implications 
for IRI construction. 

Davis (1978), for example, examined the ability of ques- 
tions created by secondary teachers to discriminate between 
good and poor readers and between levels of difficulty. She" 
reports that "'as a whole, the set of inventory questions.operates 
appropriately by demonstrating expected differences among the 
subjects a-nd the graded passages" (p. 15). She also reported, 
however, that the individual questions, especially vocabulary 
questions, did not have high discriminatory powfy in 
distinguishing good and poor readers. Because of the limitations 
in the design of individual questions. Davis recommends a 
rethinking,of the practice' of encouraging teachers to constrwct^ 
IRIS, though her criticisms are probably equally valid for com- 
mercially published iris. 

More problematical are the findings of Greenlaw and Peter- 
son (cited in Peterson, Greenlaw, & Tierney, 1,978) who reported 
that teachers, using each of three popular se ts of question- 
construction guidelines, arrived at very different sets of iri ques- 
tions. That is, none of the popular guidelines used in this study 
were sufficiently well-defined to result in the creation of identical 
question sets. The impact of such differences upon instructional ' 
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placement was, demorfstrated in another study (Peterson, , 
Greenlaw, & Tierney, 1978). Utilizing one IRI with three different 
sets of questions designed according to a single set of guidelines, 
the. reading- .skilly of 57 children, grades two through fiv^, were 

'evaluated The correlation of the reading placements derived 
from the three sets of questions ranged from .78 to .83. Different 
questions, in other words, result in the attribution of different 
reading level designations. v \ 

Thus, question variability le&ds to the attribution of dif- 
ferent reading levels for the same reading behavior. This presents 
a problem for informal reading inventory cdnstruction of both the 
teacher- made and commercial varieties. Investigatioiisis needed 
to find out whether it is possible to specify question writing 
criteria which will allow jriaximum discriminability of questions 

, and which will lead to J ftiore consistent reading level designa- 
tion. 4 \ . 

Teacher Constructed vs 

Informal Reading Inventories "* - 

Throughout this review we have, as pointed out iji the 
introductory section, discussed two somewhat different forms of 
informal, reading inventories. One form is that of teacher con- 
structed informal reading tests and a second is pifblished in- 
formaUreading inventories. There is a serious question as to how 
these two types of IRIs compare. In our search of the literature 
for this review, we were unable to find a single study whi'ch ad- 
dressed this issue, therefore, we felt i Appropriate to undertake a 
study aimed att>roviding at least a prelWlnary answer as to how 
the two types of IRIs compare. We were also somewhat interested 
in how these two forms of evaluation compared with a widely 
used, rridre standardized type instrument, notably the reading 
section of'the Wide Range Achievement Test (wrat)* 
* * 

Subjects 

The subjects of this study were 33 students who were eval- 
uated as part of the diagnostic service of^thfe Reading Center at 
the University oPDelaware. They represented a wide range hv 
terms of age and reading ability. The mean age was 9-11 with'a 
Vange from 7-2 to 15-11; average grade placement was 3.9, with a 
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range from first through ninth; reading instruction levels r&ged 
frqm prepKmer through sixth. While all subjects had been refer- 
red for diagnosis because of a suspected readirtgproblem, eight or 
24 percent of the subjects were diagnosed asl»ving a reading 
level that was at their grade placement. (This is ppt to imply that 
they were reading at an apprppriate level. While 6ne might ex- 
pect a child with above average -intelligence to r^&d above ^rade 
level, for purposes of this study it did not seem imperative to ad- 
dress this issue.) Thus, the study looks afc students' reading both 
at of below grade placement. None of the students- tested was 
reading above grade placement. - 

- > * 

Test Materials > © . w 

As part of a larger diagnostic battery, each subject in this 
study was given each of the following reading tests: The Reading 
Section of the Wide Range Achievement Test ( Jastak & Jastak, 
1978), fhe Basic Reading Inventory (Johns, 1978), and a clinician 
constructed informal reading inventory. To accomplish the last 
measure, students coming for the evaluation were requested io 
bring with them a copy of the reading text they wer£ currently 
using in school. -From this text, the clinician responsible for con- 
ducting the diagnosis-of this youngster constructed an 1RI follow- 
ing the directions provided in Stauffer, Abrams, and Pik^ulski 
( 1 978). All clinician s had roeejved at least one month's {raining in 
the construction and administration of tests. All were Master's 
degree candidates' working on a full time basis at the heading 
Center. In addition, because of the questions raised previously in 
this chapter about the interpretation of the results of the Basic 
Reading Inyentory, the procedures for recording, scoring, and 
calculating the results of that measure were the. same employed 
with the teacher-constructed 1R1. • 

\_ 

Results ) * v J : 

Because of the preliminary nat'ure.of the stud}), complex 
analyses of the data seemed inappropriate. Instead, the data were 
analyzed in a simple, straightforward fashion to answer the fol- 
lowing questions: 

1. "How did the average grade level score for the teaqher- 
constructed IRI, the published IR1, and the WRAT com-, 
pare?, ' -> 
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There w&£ an outstanding, amount qjf agreement be- t , 
tween the twq forms of IRI The'average grade score" for 
. the clinician constructed instrument was 1.88 whiKTthe 
.average gralie'score for the published IRI was L70; both 
were approximately second grade level. In shafp'co'nr 
trast the average grade score on the Reading Section of 
the WRAT was 4.01 or fourth grade— more than twiT 
grade levels above the scores obtained from the IRIS. 
2, Ho*v frequently^would students be placed* at the same 
instructional reading levels by tl\e three measurement 
instruments and how frequently would the results vary 
m by one-grade level or more? ~ 

Here again fihe results'of the two IRIs are remarkably 
consistent-22 of the total population of 33 (67 percent) 
. students were placed at the same instructional level; the 
remaining 11 or 33 percent ^e within one grade level 
of each other. There was some tendency for the clinician 
constructed IRI to^yield somewhat higher scores. Of the 
11 students who wer§ within one year of each other, 
eight (24s*percent) of the total of 33 scored one grade 
level higher on the clinician constructed iri. while only 
three (9 percent) of the tot^l population scored one .year 
higher' on the^ublished iri than they did on the clini- 
cian constructed version. A* one might expect from the 
average grade; scores reported ©arlier, there was not 
nearly so close an agreement between the WRAT and. the 
IRIs. When compared to the clinician constructed" iri, 
the two measures never placed Students at the same 
"grade level. The WRAT score was one grade lower for one 
child or 3 percent^bf the population. By far the outstand- 
ing tendency was for the WRAT to yield, much* higher 
scores. U placed 12 (37 percent) of the students one 
grade level higher than did\he clinician constructed iri, 
10 students J30 percent) were placed three levels, higher 
on the -WRAT than on ftie clinician constructed iri 

The results obtained when comparing the 'instruc- 
tional levels for the WRAT and the published iri wertfre- * 
markably similar to those just discussed; therefore/ 
these results will not be reported in order to conserve < 
space. 

/* .» 
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Discussion 

The authors of this chapter were extremely surprised by 
the high degree of agreement between the two types of iris. 
Based uporfclirfcial observations of tesUjata, we h^d anticipated 
far less agreement, the results wer^Wltis extremely encouraging.* 

It should be pointed out ^iat these results were obtained 
using the same recording, scoring, and interpretation procedures 
with both the clinician constructed etnd published IRIs and that 
the measures were administered by tne same clinician. All levels 
Were set f only after review' by the Faculty supervisor of the 
Diagnostic Service. It seems almost certain that if the procedures 
outlined by the author of the published IRI had been followed, 
substantially lfess agreement would have been achieved. Thus, 
these results largely suggest that Similarly trained clinicians 
use agreed upon procedures and criteria, they can obtain very 
, similar results with respect to setting an instructional level 
regardless of whether they construct their own IRI or use pub- 
lished materials. 

As indicated previously, we h^tiot anticipated so closB an 
agreement. One possible reason for this is that at times the dif- 
ference<obtained in an actual r£adifcg performance were drama- 
tic, even though the same instructional level was established ^ = 
* Tisfryj both IRIs. F©r example, in one case a second grade child 
whoWored below a preprimer instructional level achieved an oral, 
i^dirtfc accuracy score of 92 percent when reading an. IRI selec- 
tion ba^d upon the preprimer instructional materials being used 
in school; the 92 percent score also 'represented a rather labored 
oral reading and/ therefore, the child was judged to fallt)elow 
standards even at a preprimer level. However, when asked to 

read the preprimer passage from the published reading inven- — ■ 

tory, this child achieved an oral reading score of only 32 percent 
Though the instructional level was still below the preprimer, the * 
two oral reading performances were dramatically different. Vis- 
ual inspection of the actual scores from the two iris suggested * J 
that children who were reading. at a first reader level or below 
were* more likely to do better on an informal reading inventory ~ « 
- based on their instructional materials than on some general, 
published iri Given the facf that instructional materials vary 
considerably in the vocabulary and skills that -they introduce, 
especially in the earliest'levels of the programs, this is hardly a * 
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surprising finding. It does suggest, however, that the use of a 
general, published IRI >vith beginning readers may not reflect the 
specific vocabulary and skills that they have mastered in their 
program of reading instruction. It may, on the other hand, give a 
good estimation of the general or functional reading skills 
mastered by the student. 

Th&results of this preliminary study confirm the Impres- 
sions of many reading specialists that the reading section of the 
WRVF seriously overestimates a' child's instructional level While 
the-WR^T may have utility as a quick, gross screening device, 
these results suggest that approximately one out of three times it 
will overestimate a child's instructional level by as much as three 
grade levels, , / 

On^ final obseryationi seems in order. There was a substan- 
tial amount of agreement4>etween teacher judgment as reflected 
in the book placement of the subject ^sted and the results vf 
both iris. For example, when the instructional level from the 
clinician constructed iri was compared with the level at which 
the child was actually receiving instruction, the grade level was 
the same for 21 (64 percent) of the 33 subjects. Three (9 percent) 
children were underplaced, that is their instructional level estab- 
lished by the IRI was a year higher than the level of the boolc in 
which they were~reqeiving instruction. Nine of the children (27 
percent) were overplaced according to these -results. Seven of 
them (21 percent) were placed in books above the instructional 
level established by our testing, and two (6 percent) were in books 
two levels above their established instructional level. Given the 
fact that there ishkely to be some degree of error in our measure- 
ment, and the difficulties involved in interpreting reading testf 
results, teacher judgment for thfs group of youngsters appears to 
be accurate to an encouraging degree. 

Summary anji Conclusions > 

• ' Based on the review that has jusft been made, the following 
Conclusions' seem in order with respecUto the use and interpreta- 
tion of t}ie iri, ' </ 

1. Published IRIs in particular should provide information 0 
about alternate form and tftt-retest reliability. Re- 
search is needed to indicate the reliability of judgments 
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regarding specific skill strengths and weaknesses deriv- 
* ed from an IRI performance. 

2. Because a given reading text may contain selections 
which vary considerably in readability, teachers and 
N other diagnosticians should carefully choose Selections 
when constructing % an informal reading inventory. When 
. using an IRI. that was^constructed by a publisher to ac- 

. \ C - company its reading materials, users .should critically 
ask if it i^Bimilar in content and skill demands to the 
materials that are being considered for use with -tfie 
child being tested. 

Though the empirical support for the usfe of the tradi- 
tional criteria for establishing independent, instruc- 
tional, and frustration" levelSTs-exceedingly we'ak.^pro- 
fessional opinioi) is very supportiv#of their acceptance. 
Until more complete, more convincing, and more consis; 
tent research results suggest adoption of some pther* set 
of criteria, it seems best ,to employ those generally at- 
tributed to Betts. • 

4. Errors or miscues^shoiild be analyzed both qualitatively 
and quantitatively. Mistue analysis or some simplified , 
adaptation -of it seems a reasonable ^framework f<^ a 
qualitative analysis. 

5. The qualitative ^analysis oP or&l reading errors or 
miseries should foqis on the deviations from text that 
take plpce at or Very near a child's instructional level if 
these are to be used to mak 4 § recommendations for in- 

. struction. 

6. Until more research results are available, 'it seems un- 
wise to calculate accuracy o£an oral reading score that 
takes into account the psycholinguistic properties of the 
miscue or error. It would seem that new criteria*for 
reading levels would need to be developed biased on^ch 
an analysis. Calculating the accuracy of ansoral reading . 
score based on the psycholinguist^ properties of the er- 
rbr would seem to alter the- traditional criteria in an 
unknown fashion. The same^can be said of any major 
changes in ba$ic IRI procedures which lekd to alterations 
of student performance levels; such changes would seem t 
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likely to require similar adjustments in the traditional 
criteria. »| 

7. More studies are needed whicfr attempt to establish fche 
validity of informal evaluation and the criteria that 
should be used in establishing -levels by determining 
how well IRI results predict the amount of progress that 
children make in reading. 

8. Future efforts need tQ be directed towards the design of 
question writing guidelines which will allow the creation 
of jnore discriminable questions which result in a stable 
attribution of reading levels., 

There are many questions tfrat f remain unanswered and 
issues that remain unresolved with respect to the use of the infor- 
mal reading inventory. It seems likely that this will not diminish 
the popularity of the approach since many, perhaps most, of those 
issues are not unique to inftfrmal reading inventories, but are 
shared by other approaches to reading evaluation. The strength 
of the IRI Very likely lies in the close match that it can allow be- 
tween testing and teaphing^ Because we see this as the central 
characteristic of IRIs, we also se§ a guiding principle for how to 
decide on the details of administering, scoring, and interpreting 
IRIs-do things the way you would do them when teaching 
reading. „ J? y 
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